coverlet-coverage / coverlet

Cross platform code coverage for .NET
MIT License
2.97k stars 386 forks source link

Coverage causes hangs #511

Closed clairernovotny closed 2 weeks ago

clairernovotny commented 5 years ago

Hi,

When adding coverage for .NET Framework, it's causing hangs. It works fine on .NET Core: https://dev.azure.com/onovotny/GitBuilds/_build/results?buildId=2203&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9

This commit has the changes: https://github.com/onovotny/bc-csharp/commit/5ddb17530b4175ce31092ac671d2ed496c2ecd06

Here's the command that fails, when on .NET Framework: https://github.com/onovotny/bc-csharp/commit/5ddb17530b4175ce31092ac671d2ed496c2ecd06

Seems like it's happening with .NET Core too: https://dev.azure.com/onovotny/GitBuilds/_build/results?buildId=2204&view=logs&j=12f1170f-54f2-53f3-20dd-22fc7dff55f9&t=bfbec40a-1b5e-5690-b870-859627cad0c0&l=20

Should be able to repro with dotnet test -f netcoreapp2.1 ...

Repro steps:

Clone: https://github.com/onovotny/bc-csharp Checkout commit: 4a401ed22df13ef61b2bb7a306ee8a083167f1c6 Go to crypto\test Run dotnet test -f netcoreapp2.1 and see that it passes. About 1.5 min on my machine Run dotnet test -f netcoreapp2.1 -s ..\..\CodeCoverage.runsettings

Tests are taking much longer to run (never complete) and fail in some cases.

I think it's struggling on this test: https://github.com/onovotny/bc-csharp/blob/netstandard/crypto/test/src/math/ec/test/ECPointPerformanceTest.cs#L176

Perhaps with some of the random number generators in use there?

MarcoRossignoli commented 5 years ago

Hi Oren, I did some check and it's related to instrumentation, that test seem a performance test comparison https://github.com/onovotny/bc-csharp/blob/netstandard/crypto/test/src/math/ec/test/ECPointPerformanceTest.cs#L21 that lead to a lot of call to instrumentation code

image

I did a debug with msbuild but nothing different core is the same

dotnet test /p:CollectCoverage=true /p:Exclude=\"[xunit.*]*,[*Tests]*,[nunit.*]*,[NUnit3.*]*\" /p:ExcludeByAttribute=\"Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute\" /p:SingleHit=true /p:CoverletToolsPath=C:\git\coverlet\src\coverlet.msbuild.tasks\bin\Debug\netstandard2.0\

With SingleHit=true should be better...but maybe you should exclude that test from "coverage check". In my test I get result after some minutes

C:\git\coverletissue\bc-csharp\crypto\test (HEAD detached at 4a401ed -> origin)                                                                                                                                                               
λ dotnet test /p:CollectCoverage=true /p:Exclude=\"[xunit.*]*,[*Tests]*,[nunit.*]*,[NUnit3.*]*\" /p:ExcludeByAttribute=\"Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute\" /p:SingleHit=true /p:CoverletToolsPath=C:\git\coverlet\s
rc\coverlet.msbuild.tasks\bin\Debug\netstandard2.0\                                                                                                                                                                                           
...                                                                                                                                                             

Avvio dell'esecuzione dei test in corso. Attendere...                                                                                                                                                                                         
  ! TestFunction                                                                                                                                                                                                                              
  ! TestSumOfMultipliesComplete [1ms]                                                                                                                                                                                                         
  ! TestSumOfTwoMultipliesComplete                                                                                                                                                                                                            
  ! TestMultiply                                                                                                                                                                                                                              

L'esecuzione dei test è riuscita.                                                                                                                                                                                                             
Total tests: 995                                                                                                                                                                                                                              
     Passed: 991                                                                                                                                                                                                                              
Tempo di esecuzione dei test: 24,4505 Minuti                                                                                                                                                                                                  

Calculating coverage result...                                                                                                                                                                                                                
  Generating report 'C:\git\coverletissue\bc-csharp\crypto\test\coverage.json'                                                                                                                                                                

+---------------------+--------+--------+--------+                                                                                                                                                                                            
| Module              | Line   | Branch | Method |                                                                                                                                                                                            
+---------------------+--------+--------+--------+                                                                                                                                                                                            
| BouncyCastle.Crypto | 75,16% | 62,9%  | 72,22% |                                                                                                                                                                                            
+---------------------+--------+--------+--------+                                                                                                                                                                                            

+---------+--------+--------+--------+                                                                                                                                                                                                        
|         | Line   | Branch | Method |                                                                                                                                                                                                        
+---------+--------+--------+--------+                                                                                                                                                                                                        
| Total   | 75,16% | 62,9%  | 72,22% |                                                                                                                                                                                                        
+---------+--------+--------+--------+                                                                                                                                                                                                        
| Average | 75,16% | 62,9%  | 72,22% |                                                                                                                                                                                                        
+---------+--------+--------+--------+                                                                                                                                                                                                        

@jkotas do you think that aggressive inlining could help here(I think that tiering cannot help to remove the call)

clairernovotny commented 5 years ago

I think there may be other things going too as it's sometimes affectint the test itself:

  X TestCrypto [1s 462ms]                                                                                                 Error Message:                                                                                                           AES Monte Carlo Test 13: failed - expected 77ba00ed5412dff27c8ed91f3c376172 got 094d55f0268b6f471afeaeef0d591424       Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AllTests.TestCrypto() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AllTests.cs:line 43

  X TestFunction [1s 457ms]                                                                                               Error Message:                                                                                                             Expected string length 9 but was 40. Strings differ at index 3.
  Expected: "AES: Okay"
  But was:  "AES Monte Carlo Test 13: failed reversal"
  --------------^

  Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AesTest.TestFunction() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AESTest.cs:line 175

Not every run, but more than once. If you run the tests several times, it'll eventually fail with that.

Or this

  X TestECKeyAgree [150ms]                                                                                                Error Message:                                                                                                           System.NullReferenceException : Object reference not set to an instance of an object.
  Stack Trace:
     at Org.BouncyCastle.Crypto.Generators.ECKeyPairGenerator.GenerateKeyPair() in D:\dev\BouncyCastle-PCL\crypto\src\crypto\generators\ECKeyPairGenerator.cs:line 100
   at Org.BouncyCastle.Cms.Tests.CmsTestUtil.MakeECDsaKeyPair() in D:\dev\BouncyCastle-PCL\crypto\test\src\cms\test\CMSTestUtil.cs:line 253
   at Org.BouncyCastle.Cms.Tests.EnvelopedDataStreamTest.get_OrigECKP() in D:\dev\BouncyCastle-PCL\crypto\test\src\cms\test\EnvelopedDataStreamTest.cs:line 63
   at Org.BouncyCastle.Cms.Tests.EnvelopedDataStreamTest.TestECKeyAgree() in D:\dev\BouncyCastle-PCL\crypto\test\src\cms\test\EnvelopedDataStreamTest.cs:line 492
clairernovotny commented 5 years ago

Those errors shouldn't be happening based on those code paths, it seems like Coverlet may be interfering with the normal code flow.

MarcoRossignoli commented 5 years ago

If you run the tests several times, it'll eventually fail with that.

You mean more than one time in same CI loop(to undestand if could be related to instrumenting more than once problem)? I don't know what tests does...could be related to IL "instrumentations"?Coverlet injects a static class per module(ModuleTrackerTemplate) and a static call RecordSingleHit/RecordHit to it for every sequence/branch point.

clairernovotny commented 5 years ago

I mean running it from the command line different times. It's non deterministic, and only happens with Coverlet. Has never happened without.

MarcoRossignoli commented 5 years ago

Can you try to "cleanup"(git -fdx) between tests and check if it happens?

clairernovotny commented 5 years ago

Will try. In the interim, even with the ExcludeFromCodeCoverage attribute and SingleHit=true, TestSumOfTwoMultipliesComplete appears to take a very long time and isn't completing.

clairernovotny commented 5 years ago

Hmm....running twice without a clean does appear to trigger the errors. Seems like it's over-instrumenting or breaking something that way?

MarcoRossignoli commented 5 years ago

IIRW we had some issue where libs were instrumented more than once leading to weird results.

jkotas commented 5 years ago

@jkotas do you think that aggressive inlining could help here(I think that tiering cannot help to remove the call)

This is small-enough method to be inlined by JIT without any hints. I do not expect that aggressive inlining would help anything here.

MarcoRossignoli commented 5 years ago

Hmm....running twice without a clean does appear to trigger the errors. Seems like it's over-instrumenting or breaking something that way?

At the moment we don't check if module is already instrumented(maybe we could try to check if module tracker is present)

clairernovotny commented 5 years ago

Spoke too soon, even after a clean, I got an error:

 PS> git clean -xdf                                                                                                                                                                                                                                netstandard ~2    D:\dev\BouncyCastle-PCL                                                                           
 PS> dotnet test .\crypto\test\crypto.test.csproj -s .\CodeCoverage.runsettings                                         src\asn1\test\LinkedCertificateTest.cs(51,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\test\NamedCurveTest.cs(254,17): warning CS0162: Unreachable code detected [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\crypto\test\SCryptTest.cs(68,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\test\NamedCurveTest.cs(254,17): warning CS0162: Unreachable code detected [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\asn1\test\LinkedCertificateTest.cs(51,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
src\crypto\test\SCryptTest.cs(68,38): warning CS0168: The variable 'e' is declared but never used [D:\dev\BouncyCastle-PCL\crypto\test\crypto.test.csproj]
Test run for D:\dev\BouncyCastle-PCL\crypto\test\bin\Debug\netcoreapp2.1\crypto.test.dll(.NETCoreApp,Version=v2.1)
Microsoft (R) Test Execution Command Line Tool Version 16.1.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
                                                                                                                          ! TestFunction                                                                                                                                                                                                                                  X TestFunction [1s 367ms]                                                                                               Error Message:                                                                                                             Expected string length 9 but was 40. Strings differ at index 3.
  Expected: "AES: Okay"
  But was:  "AES Monte Carlo Test 11: failed reversal"
  --------------^

  Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AesTest.TestFunction() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AESTest.cs:line 175

                                                                                                                          X TestCrypto [1s 377ms]                                                                                                 Error Message:                                                                                                           AES Monte Carlo Test 11: failed - expected acc863637868e3e068d2fd6e3508454a got a2164f3e337eb7cfdc3876ab22d9760e       Stack Trace:
     at Org.BouncyCastle.Crypto.Tests.AllTests.TestCrypto() in D:\dev\BouncyCastle-PCL\crypto\test\src\crypto\test\AllTests.cs:line 43

                                                                                                                          ! TestMultiply                                                                                                                                                                                                                                  ! TestSumOfMultipliesComplete [1ms]                                                                                                                                                                                                             ! TestSumOfTwoMultipliesComplete                                                                                      Test run in progress.                                                                                                                        .Attempting to cancel the build...
Attempting to cancel the build...

   1    netstandard ~2    D:\dev\BouncyCastle-PCL                                   
MarcoRossignoli commented 5 years ago

Tell me where you put ExcludeFromCodeCoverage I would try to repro on my local(or if you've branch/commit to clone with above setup).

clairernovotny commented 5 years ago

I tried it on ECAlgorithmsTest.cs on the TestSumOfTwoMultipliesComplete method. It's in crypto\test\src\math\ec\test

clairernovotny commented 5 years ago

Looks like it takes 8.2 minutes on my machine to complete (with SingleHit = true and no ExcludeFromCodeCoverage attributes set). Still a significant issue given that it takes 1.5 minutes without any coverage.

MarcoRossignoli commented 5 years ago

I'll try to understand what's going on, thank's for quick infos! I'm not sure if we can improve perf...it depends on how many "new call" are injected. cc: @tonerdo

clairernovotny commented 5 years ago

Perf isn't the only issue. I'm concerned about the random failures it injects.

image

Locally, after 8.3 min, it passed one platform but something failed on another. Started with a clean repo.

Same thing on Azure Pipelines happened to pass completely this time. https://dev.azure.com/onovotny/GitBuilds/_build/results?buildId=2207&view=results

MarcoRossignoli commented 5 years ago

Perf isn't the only issue. I'm concerned about the random failures it injects.

I'll try to repro asap

clairernovotny commented 5 years ago

In CmsTestUtil, the property getters call Init() on the objects they create, so that by the time GenerateKeyPair() is called, the local parameters variable should always be set. Something appears to be interfering in there.

MarcoRossignoli commented 5 years ago

Oren can you try to add -m:1 only to exclude another know issue (No repro using msbuild task more times for the moment on my local, now try collector)

clairernovotny commented 5 years ago

It repro's with -m:1:

It's not every time, it took 3-4 runs to make it happen, each starting with a clean repo

image

MarcoRossignoli commented 5 years ago

I'll try to reproduce and also compare instrumented vs non instrumented IL

It's first time I see an issue like that, maybe instrumentation change execution order that generates some bug (race etc...)

MarcoRossignoli commented 5 years ago

@onovotny I did a lot of test on 2 different machine yesterday evening 4/5 time x machine, and no luck...I cannot repro, my command was

C:\git\coverletissue\bc-csharp> dotnet test .\crypto\test\crypto.test.csproj -s .\CodeCoverage.runsettings 

runsettings

<?xml version="1.0" encoding="utf-8"?>
<!-- File name extension must be .runsettings -->
<RunSettings>
  <DataCollectionRunSettings>
    <DataCollectors>      
       <DataCollector friendlyName="XPlat code coverage">
        <Configuration>
          <Format>cobertura</Format>
          <Exclude>[xunit.*]*,[*Tests]*,[nunit.*]*,[NUnit3.*]*</Exclude> <!-- [Assembly-Filter]Type-Filter -->
          <ExcludeByAttribute>Obsolete,GeneratedCodeAttribute,CompilerGeneratedAttribute</ExcludeByAttribute>
          <SingleHit>true</SingleHit>
        </Configuration>
      </DataCollector>    
    </DataCollectors>
  </DataCollectionRunSettings>
</RunSettings>

Can you check that my config it's the same as yours?

This is the instrumentation of Init() on RsaKeyPairGenerator image

I've one suspect for https://github.com/tonerdo/coverlet/issues/511#issuecomment-519087222 ...that for some reason tests are running in parallel and two test ask for static access to https://github.com/onovotny/bc-csharp/blob/netstandard/crypto/test/src/cms/test/CMSTestUtil.cs#L84-L91 and the first init kpg so null check above return non initilized instance for another thread that could lead to null ref. Maybe could happen because instrumented code are slower than non instrumented or is no more inlined so the randomic nature of failure. Can you try to force tests serialization?

Bertk commented 2 weeks ago

This issue is outdated (before coverlet version 6.0.2) and not related to #1192 regression.