MiniProfiler / rack-mini-profiler

Profiler for your development and production Ruby rack apps.
MIT License
3.71k stars 404 forks source link

Intermittent test failure #569

Closed nateberkopec closed 1 year ago

nateberkopec commented 1 year ago

https://github.com/MiniProfiler/rack-mini-profiler/actions/runs/4636002947/jobs/8203512568?pr=553#step:6:19

willkoehler commented 1 year ago

This is a race condition caused by timestamps in the json body of the stubbed request

The specific code is

compressed = compress(json)
stub_request(:post, url)
  .with(
    body: compressed,
    headers: {
      'Mini-Profiler-Transport-Auth' => 'somepasswordhere',
      'Content-Encoding' => 'gzip'
    }
  )
  .to_return(status: 200, body: "", headers: {})

The json being compressed is

"{\"snapshots\":[{\"id\":\"sn941cm9oe408b11kbiw\",\"name\":null,\"started\":1473611003,\"started_at\":1681659363573,\"machine_name\":null,\"level\":0,\"user\":\"unknown user\",\"has_user_viewed\":false,\"client_timings\":null,\"duration_milliseconds\":0,\"has_trivial_timings\":true,\"has_all_trivial_timings\":false,\"trivial_duration_threshold_milliseconds\":2,\"head\":null,\"sql_count\":0,\"duration_milliseconds_in_sql\":0,\"has_sql_timings\":true,\"has_duplicate_sql_timings\":false,\"executed_readers\":0,\"executed_scalars\":0,\"executed_non_queries\":0,\"custom_timing_names\":[],\"custom_timing_stats\":{},\"custom_fields\":{},\"has_flamegraph\":false,\"request_method\":null,\"request_path\":null,\"root\":{\"id\":\"np0okbnejcsug41u99ta\",\"name\":\" http://:\",\"duration_milliseconds\":0,\"duration_without_children_milliseconds\":0,\"start_milliseconds\":0,\"parent_timing_id\":null,\"children\":[],\"has_children\":false,\"key_values\":null,\"has_sql_timings\":false,\"has_duplicate_sql_timings\":false,\"trivial_duration_threshold_milliseconds\":2,\"sql_timings\":[],\"sql_timings_duration_milliseconds\":0,\"is_trivial\":false,\"is_root\":true,\"depth\":0,\"executed_readers\":0,\"executed_scalars\":0,\"executed_non_queries\":0,\"custom_timing_stats\":{},\"custom_timings\":{}},\"started_formatted\":\"/Date(1681659363573)/\"}]}"

I can reliably reproduce the error by inserting sleep 1 between compressed = compress(json) and stub_request(:post, url)...

I think there's a straightforward fix using ActiveSupport::Testing::TimeHelpers I'll look into this and open a PR later today. Wanted to post my finding here in the meantime.

willkoehler commented 1 year ago

Quick update: It's still a race condition and adding the sleep reproduces it consistently. But... the timestamps in the json body of the stubbed request aren't causing the problem. The json body is generated once and is passed to both the stub and to gzip_transporter. So those timestamps will always be the same.

I'm currently exploring Zlib::GzipWriter.new(io). That appears to have some sort of time dependency. If I sleep before calling #write(json), It generates a different binary than compressed = compress(json) in the test setup.

BAD

body = if @gzip_requests
  require 'zlib'
  io = StringIO.new
  gzip_writer = Zlib::GzipWriter.new(io)
  sleep 1 # Sleeping here results in a different zip binary than in the test setup
  gzip_writer.write(json)
  gzip_writer.close
  headers['Content-Encoding'] = 'gzip'
  io.string

OK

body = if @gzip_requests
  require 'zlib'
  io = StringIO.new
  gzip_writer = Zlib::GzipWriter.new(io)
  gzip_writer.write(json)
  sleep 1 # OK to sleep here. Zip binaries are the same
  gzip_writer.close
  headers['Content-Encoding'] = 'gzip'
  io.string
willkoehler commented 1 year ago

Update 2: Yep. The "local file header" in the Zip file format has a time and date in it. https://en.wikipedia.org/wiki/ZIP_(file_format)#File_headers

Those are changing between the test setup and the call to gzip_writer.write(json). For some reason TimeHelpers#freeze_time doesn't affect the time used by Zlib::GzipWriter I'm looking for a different fix.

willkoehler commented 1 year ago

Update 3: Looking at Zlib source, the time is set here https://github.com/ruby/zlib/blob/a68a1f7a9acaea2ce33bc933e625f71ca0d8fcd6/ext/zlib/zlib.c#L2660-L2662 It's a direct call to the C standard library time_t time( time_t *second ) function. I don't see an easy way to fake that.

I think the best strategy will be to mock Zlib::GzipWriter. We don't care about the contents of the zipped body, as long as it's being zipped. I'm going to try a PR that does this and see how it looks.

willkoehler commented 1 year ago

I built a fake Zlib::GzipWriter and wired it in.

class FakeGzipWriter
  def initialize(io)
    @io = io
  end

  def write(body)
    @io.write("x" * body.length)
  end

  def close
  end
end

It fixed the spec. However I had second thoughts because SnapshotsTransporter#flush_buffer would be using FakeGzipWriter which is not a verifying mock and is not guaranteed to behave the same as Zlib::GzipWriter and could mask a future bug.

I then discovered that Timecop https://github.com/travisjeffery/timecop mocks time at a lower system level and works on the C standard library function that Zlib::GzipWriter uses. Wrapping the spec in Timecop.freeze do also fixes the intermittent failures 🥳

This feels like a better solution. I'm going to sleep on it and open a PR in the AM.

OsamaSayegh commented 1 year ago

Hey @willkoehler 👋 Thanks for looking into this!

I feel like this problem is an indication of a code smell in the test itself:

https://github.com/MiniProfiler/rack-mini-profiler/blob/0809d9c40f90bfc1dc1d8736c300c144aa5a0b49/spec/lib/snapshots_transporter_spec.rb#L138-L157

It's kind of testing itself, especially the last assertion (plz don't git blame the test 🙈😅)...

I think there's a better way to write this test which would also make this problem go away.

So right now we're passing the binary of the gzipped JSON that we expect in the request body to the stub_request(...).with(body: ...) call. This is what's causing our test to be flakey because the binary that we compute during the test can sometimes be slightly different from the binary that the transporter sends in the HTTP request (as you already figured out).

We can rewrite the stub_request().with call and pass to it a block where we can do custom matching. So instead of passing the body/binary that we expect, we should be able to do something like this:

stub_request(:post, url)
  .with { |req| req.headers['Content-Encoding'] == 'gzip' && decompress(req.body) == json } 

This should eliminate the test flakiness because we no longer compute the gzipped binary during the test. I also think this will make our test cleaner and more efficient. Let me know what you think!

willkoehler commented 1 year ago

Thanks for the clear analysis and suggestion @OsamaSayegh I like the way you're thinking 🙌

I took your idea one step further by making the stub_request permissive and running the assertion after the fact with assert_requested. This results in a test that is more clearly grouped into Arrange / Act / Assert.

I opened a PR #577