buildkite / test-collector-ruby

Buildkite Test Analytics collector for Ruby test frameworks
http://buildkite.com/test-analytics
MIT License
15 stars 26 forks source link

Batch Upload API requests for rspec #174

Closed niceking closed 1 year ago

niceking commented 1 year ago

Description

This is the first part of PIE-1436, which will drop ActionCable support from test-collector-ruby. This removes ActionCable support for test suites configured to use RSpec.

Use of the SocketSession (websocket implementation) is removed from the RSpec config and replaced with a new Session that sends requests in batches via the Upload API. The batch size is configured via a new env variable, BUILDKITE_ANALYTICS_UPLOAD_BATCH_SIZE, which defaults to 500. This is configurable mainly to allow ease of testing.

The implementation of this leverages RSpec custom formatters. After every test, we have defined a custom handler called handle_example. The method takes the id of the example which has just executed, and stores it in a buffer. Once this buffer reaches the batch size, the trace data associated with the example ids is uploaded to Buildkite via the Upload API.

After the test suite has finished executing, the dump_summary method is called. This sends any outstanding examples to the Upload API.

The post requests to Upload API are each done in a separate thread, so that sending large payloads of data do not hold up the execution of the test suite. We retry the Upload API request up to 3 times, if the errors we received look to be network related, rather than a result of not being configured correctly, or the particular suite being over its rate limit.

At the end of the test suite, the test collector will wait up to 60 seconds for any last Upload API requests to finish, before exiting the test suite.

Verification & Deployment

This change can be tested locally by checking out this branch and bundling buildkite/buildkite with this checked out repository, then running BUILDKITE_ANALYTICS_RSPEC_TOKEN=xx-local-analytics-key BUILDKITE_ANALYTICS_UPLOAD_BATCH_SIZE=5 BUILDKITE_BUILD_ID="1" rspec spec/models/webhook or similar in the buildkite/buildkite repository.

After merge, my plan is to do another PR to do a pre-release of this change. Then I will create a branch on buildkite/buildkite to run some scheduled builds with the new release, and monitor for any errors or performance degradations. Typically, an rspec job in Buildkite has 300-600 tests, so we'll be creating 1-2 uploads per rspec job.

Minitest to come after this!

swebb commented 1 year ago

@niceking did you accidentally force push from the wrong branch? This looks like the change set for renaming session to socket session.

swebb commented 1 year ago

I had a look at some of the earlier commits. Am I reading it correctly in thinking the uploads will be in batches of 5 executions at a time. That number seems very low to me. I'm worried it will result in lots of uploads (a single run of bk/bk generates 13k executions, so 2.5k uploads). I'm worried some of our queries won't work well with that number of uploads. Also, that will result in 2.5k files per run stored in S3 (I'm not sure if there's any limits/costs there).

I'm also worried we might DOS ourselves. Previously it was limited to 1 websocket per agent. Now each agent can generate a LOT of requests at the same time.

swebb commented 1 year ago

I'm impressed you managed this with such a small change set 👍🏻

niceking commented 1 year ago

@swebb hahaha yup I did a bad rebase as I had changed the parent branch 😓 thank god for git reflog. And yeah, the batch size was just set to 5 for testing and demo purposes

jasmine-q commented 1 year ago

I had a look through this and I think there's some weird stuff going on...probs a bad rebase @niceking? like Buildkite::TestCollector.session&.write_result(trace) is still in handle_example and I thought this was meant to remove the old websocket implementation 😕

niceking commented 1 year ago

@jasmine-q yes sorry bad rebase! Have a look now :)