Invalid events being skipped when using with Minitest due to a missing test suite ID

agrobbin commented 7 months ago

Current behaviour

Starting in v0.7.0 (I believe due to #78), we've been seeing warnings like the following in our tests:

W, [2024-03-01T19:03:57.434133 #3433]  WARN -- ddtrace: [ddtrace] Invalid event skipped: Datadog::CI::TestVisibility::Serializers::TestV2(id:3172582625365482490,name:minitest.test) Errors: {"test_suite_id"=>#<Set: {"is required"}>}

Expected behaviour

None of our configuration is complex, with this being our Datadog.configure block:

config.ci.enabled = true
config.ci.instrument :minitest

Additionally, our CI env vars are:

DD_API_KEY: ${{ secrets.DD_API_KEY }}
DD_CIVISIBILITY_AGENTLESS_ENABLED: true

We're running in GitHub Actions.

Steps to reproduce

This started showing up as a warning in v0.7.0 of datadog-ci-rb (when we upgraded ddtrace to v1.20.0).

Environment

datadog-ci-rb version: v0.7.0
Configuration block (Datadog.configure ...):
Test framework version: minitest v5.21.1
CI Provider: GitHub Actions
Ruby version: v3.3.0
Operating system: Ubuntu v22.04.4

anmarchenko commented 7 months ago

Hi!

This warning appeared when we turned on test suite visibility by default, the warning is there to tell you that some of your tests are not part of a test suite for some reason (and now they are being dropped). Creating a GitHub issue is exactly the right thing to do in this situation, so let's find out why this happens.

I have a hunch in this case, because I've seen similar behaviour in Minitest once. One project I instrumented with datadog-ci had helper methods that are included in tests that started from test_ prefix. The method name was test_dir and it create a folder for temporary files. The issue is, Minitest treats every method in your test suite that starts with test_ as a real test and it runs it as part of test session. So it runs all your such helpers once and as they have no assertions marks them as passed. Datadog's test instrumentation on the other hand gets confused by these cases as these methods are not defined in Minitest::Runnable so they cannot be associated to any test suite. You can see the example of me fixing such issue here: https://github.com/anmarchenko/jekyll/commit/e0953186c8fd85f8ee25304198fb72bf5f0b9d77

Could you check your tests for helpers that start with test_? If there are some, then it is most likely the reason for these warnings. The best way to fix it then would be to avoid test_ prefix for helpers so that minitest does not run these methods as tests.

If this is not the case, then we'll continue digging further!

agrobbin commented 7 months ago

Thanks for the quick response here @anmarchenko! I just checked our test suite for did find a handful of test_*-prefixed methods that are not actually tests. I changed those, but unfortunately it doesn't seem to have made an impact on these warnings. We do use MiniTest::Spec::DSL, is it possible that is related?

anmarchenko commented 7 months ago

Hi! Let's continue investigating.

I tested this library with Minitest::Spec and so far did not uncover any issues.

Let's do the following: try to run your tests in CI with env variable DD_TRACE_DEBUG=1 and send us the logs. We will be looking then for a name of the span with type "test" and without test_suite_id associated: it might help to debug this issue.

Another question: when these messages appeared, do you still see all your tests in Datadog test runs view? Or there are some tests that are being dropped?

agrobbin commented 7 months ago

Now I'm really confused... DD_TRACE_DEBUG=1 has caused these warnings to stop showing up. Here is a snippet of the debug output for a type: test span:

2024-03-06T13:21:56.1998418Z Type: test
2024-03-06T13:21:56.1998618Z Service: atlas-rails
2024-03-06T13:21:56.1999302Z Resource: test_: [redacted] 
2024-03-06T13:21:56.1999441Z Error: 0
2024-03-06T13:21:56.1999616Z Start: 1709731315774787840
2024-03-06T13:21:56.1999766Z End: 1709731316054087680
2024-03-06T13:21:56.1999941Z Duration: 0.279298937999954
2024-03-06T13:21:56.2000167Z Tags: [
2024-03-06T13:21:56.2000459Z    span.kind => test,
2024-03-06T13:21:56.2000742Z    os.architecture => x86_64,
2024-03-06T13:21:56.2000928Z    os.platform => linux,
2024-03-06T13:21:56.2001096Z    runtime.name => ruby,
2024-03-06T13:21:56.2001287Z    runtime.version => 3.3.0,
2024-03-06T13:21:56.2001464Z    test.command => bin/rails ,
2024-03-06T13:21:56.2001656Z    test.framework => minitest,
2024-03-06T13:21:56.2001845Z    test.framework_version => 5.22.2,
2024-03-06T13:21:56.2002737Z    test.source.file => vendor/bundle/ruby/3.3.0/gems/shoulda-context-2.0.0/lib/shoulda/context/context.rb,
2024-03-06T13:21:56.2015061Z    test.source.start => 62,
2024-03-06T13:21:56.2015317Z    _test.session_id => 269349688540100905,

2024-03-06T13:21:56.5055682Z Type: test
2024-03-06T13:21:56.5055938Z Service: atlas-rails
2024-03-06T13:21:56.5056288Z Resource: test_0010_[redacted]
2024-03-06T13:21:56.5056441Z Error: 0
2024-03-06T13:21:56.5056695Z Start: 1709731316082993408
2024-03-06T13:21:56.5056858Z End: 1709731316365561088
2024-03-06T13:21:56.5057016Z Duration: 0.28256706100000883
2024-03-06T13:21:56.5057169Z Tags: [
2024-03-06T13:21:56.5057328Z    span.kind => test,
2024-03-06T13:21:56.5057516Z    os.architecture => x86_64,
2024-03-06T13:21:56.5057681Z    os.platform => linux,
2024-03-06T13:21:56.5057866Z    runtime.name => ruby,
2024-03-06T13:21:56.5058042Z    runtime.version => 3.3.0,
2024-03-06T13:21:56.5058228Z    test.command => bin/rails ,
2024-03-06T13:21:56.5058463Z    test.framework => minitest,
2024-03-06T13:21:56.5058668Z    test.framework_version => 5.22.2,
2024-03-06T13:21:56.5059008Z    test.source.file => test/models/[redacted]_test.rb,
2024-03-06T13:21:56.5059199Z    test.source.start => 63,
2024-03-06T13:21:56.5059406Z    _test.session_id => 269349688540100905,

I looked in DD CI, and you are right, we are seeing a smaller # of tests than Minitest itself is reporting. Minitest reports:

2534 runs, 8372 assertions, 0 failures, 0 errors, 14 skips

However, DD CI is showing 2,281 passed and 14 skipped. The # of warnings we got in our previous run was 139, so not exactly the # of missing tests, but perhaps that warning count is not the number of individual tests but the number of test groups / suites?

anmarchenko commented 7 months ago

I would need the whole log to try to reproduce what happens there, could you send it to me per email? andrey.marchenko at datadoghq.com

Is it consistent that there are no warnings every time you run it with DD_TRACE_DEBUG?

anmarchenko commented 7 months ago

About the inconsistency of passed tests: there might be name clashes if you generate your tests dynamically:

You can check that number of tests here:

Is the same with number here:

Both are on "Test commit" page https://app.datadoghq.eu/ci/test-commit/

If you have 2281 passed tests in the first widget, but 2534 test runs that would mean that many tests have exactly same name/test suite combinations

anmarchenko commented 7 months ago

I have another hunch: I think shoulda-context is the reason, we might have a bug with shoulda-context in minitest. Let me try to create a repro case tomorrow morning to see if this is the case.

anmarchenko commented 7 months ago

@agrobbin I am happy to tell you that I found where the bug is and I have a reproducer:

https://github.com/DataDog/datadog-ci-rb/pull/132

Now I only need to fix it. Hopefully, the fix will be ready for the next minor ddtrace release next week.

anmarchenko commented 7 months ago

@agrobbin the bug was in shoulda-context library (using eval without source location) and it was fixed in this PR:

https://github.com/thoughtbot/shoulda-context/pull/92

I can confirm that this fixes datadog-ci issue by using shoulda-context from main branch: https://github.com/DataDog/datadog-ci-rb/pull/132/commits/23199c388660f900f7abad8dc14a376e582492af

could you run your test suite with shoulda-context version below and tell me if warnings go away and you see all your tests in Datadog?

  gem "shoulda-context", github: "thoughtbot/shoulda-context", branch: "main"

agrobbin commented 7 months ago

Interesting, switching to thoughtbot/shoulda-context@main definitely reduced the # of these quite a bit. We've still got a handful of them, from 139 down to 26, which is definitely progress!

anmarchenko commented 7 months ago

These 26 that are left are coming from some other library that fiddles with source locations.

Please send me the full debug logs from the current run and I will continue investigation. If you would not rather use email, you could open support ticket and then send these logs to Datadog support (they will find me, eventually).

agrobbin commented 7 months ago

I'll email you in the next few minutes! Sorry for the delay on that, I had to put this down yesterday while looking into some other things.

DataDog / datadog-ci-rb

Invalid events being skipped when using with Minitest due to a missing test suite ID #130