[SDTEST-523] Expand test impact analysis with allocation tracing

anmarchenko commented 3 months ago

Problem statement Using line coverage for test impact analysis has a major limitation in Ruby: consider the following examples:

# this class does not have any executable lines
class MyClass < OtherClass
end

test "instantiate MyClass" do
  assert MyClass.new != nil
end

The test "instantiate MyClass" does not cover MyClass because there are no executable lines in MyClass. If initializer was inherited from OtherClass, then this test will have other_class.rb in the list of covered files but not my_class.rb.

This leads to a major intelligent test runner bug: if we change initializer of MyClass like that:

class MyClass
  def initialize(arg)
    @arg = arg
  end
end

then the test above will start failing because MyClass.new expects argument now. But because my_class.rb is not covered by this test, intelligent test runner will skip test by default! It causes broken tests to be merged in the default branch.

If this example might seem artificial, unfortunately the same happens with ActiveRecord models or with ActiveModel classes:

# perfectly valid Rails model that is not covered by any test
class Account < ApplicationRecord
  belongs_to :user
end

Solution We cannot overcome this limitation by using line coverage: the code coverage approach works correctly in this case and this is just how line coverage works. We need to go deeper in Ruby VM tracing using techniques that are already used by continuous profiler.

For this limitation, I've chosen to reach out for heap allocation tracepoint: it is possible to spy on every new object allocation that happens in Ruby heap. Even if no code from this class is executed during the test, it is enough for us to know that the test instantiates instances of this class to add its filename to the list of impacted files.

Notes on implementation:

RUBY_INTERNAL_EVENT_NEWOBJ event type is used
rb_tracepoint_new to register RUBY_INTERNAL_EVENT_NEWOBJ tracepoint
Module.const_source_location(klass_name) Ruby API is used to get the source of a constant (every class name is a constant). This API is available from Ruby 2.7 - this is exactly the oldest Ruby version supported for test visibility product. We use this API from C with rb_funcall(rb_cObject, rb_intern("const_source_location"), 1, klass_name)
rb_protect to ignore exceptions when getting source code location of a class (it fails for many internal classes)
Many libraries generate anonymous classes with class name like #<Class:0ff0eabcde> - we explicitly ignore them because getting source location for these classes always fails

Known limitations

This approach only tracks objects instantiated during the test itself (in before hook or during the test). If the test suite has some models cached in global state and shared between tests, and these models don't have any methods implemented on them, the test impact analysis will still miss them. We need to change docs on Intelligent test runner in Ruby reiterating that global state is harmful and can cause flakiness and incompatibility with ITR.
Ruby versions 3.2.0 - 3.2.2 have a bug that causes failures when using this tracepoint: allocation profiling is disabled for these versions
rb_tracepoint_new cannot be attached to a specific thread, so the allocation tracing is enabled for multi threaded coverage mode only (this mode is the default one and the only one that can work for Rails, so it is not a major problem)
Anonymous classes source locations are not supported

How to test the change? Tested using by running test suites of the following open source projects:

feedbin
rubocop
jekyll
middleman
vagrant
devdocs

Unit tests reproducing the original problem are provided. See performance evaluation below.

Performance evaluation Median performance overhead for some OSS projects' test suites before this change:

Results from benchmarks after this change:

rubocop - 111,7% overhead - 28% increase
jekyll - 0,4% overhead - 0% increase
middleman - 37,6% overhead - 32% increase
devdocs - 23,8% overhead - 39% increase

Overall this change increases code coverage overhead on test suites by 30-40% (compared to overhead before the change) in relative numbers. In absolute numbers depending on project's size and characteristics it means from 7% up to 30% more time spent in tests (the maximum overhead is from rubocop which is particularly challenging for profiling: 21k relatively fast tests).

codecov-commenter commented 3 months ago

Codecov Report

Attention: Patch coverage is 98.21429% with 2 lines in your changes missing coverage. Please review.

Project coverage is 98.86%. Comparing base (579867e) to head (aebb1bf).

Files	Patch %	Lines
lib/datadog/ci/configuration/components.rb	77.77%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #197 +/- ## ========================================== - Coverage 98.87% 98.86% -0.01% ========================================== Files 231 235 +4 Lines 10368 10477 +109 Branches 475 481 +6 ========================================== + Hits 10251 10358 +107 - Misses 117 119 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

devinburnette commented 3 months ago

just tested this branch as of the latest commit (0b1618c) and verified the segfault is gone and the test impact analysis is working as expected for models.

anmarchenko commented 3 months ago

Thank you for testing and your feedback @devinburnette! I'll do a couple more passes on Monday and will release soon

anmarchenko commented 3 months ago

@ivoanjo this is ready for another pass - seems to be working now

DataDog / datadog-ci-rb

[SDTEST-523] Expand test impact analysis with allocation tracing #197

Codecov Report