dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.33k stars 4.49k forks source link

LibraryImportGenerator unit test segfault illegal memory access #89054

Closed carlossanlop closed 2 days ago

carlossanlop commented 10 months ago

Error Blob

{
  "ErrorMessage": "",
  "BuildRetry": false,
  "ErrorPattern": "Segmentation fault .+ LibraryImportGenerator",
  "ExcludeConsoleLog": false
}

Reproduction Steps

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
ghost commented 9 months ago

Tagging subscribers to this area: @dotnet/interop-contrib See info in area-owners.md if you want to be subscribed.

Issue Details
### Error Blob ```json { "ErrorMessage": "exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.", "BuildRetry": false, "ErrorPattern": "", "ExcludeConsoleLog": true } ``` ### Reproduction Steps - `main` PR: https://github.com/dotnet/runtime/pull/88280 - Job: `Libraries Test Run release coreclr osx x64 Debug` - Log file: https://dev.azure.com/dnceng-public/public/_build/results?buildId=342413&view=logs&j=5da4f169-faec-5863-fdf3-ac008545b5e6&t=04587263-8799-5ed0-c746-7c81bb983f01&l=55 - Output: ``` Console log: 'LibraryImportGenerator.Unit.Tests' from job 70c62cc8-b91f-419a-a157-04f606e76ff0 workitem 374be96f-79af-40b2-a61a-ef0260c53efa (osx.1200.amd64.open) executed on machine dci-mac-build-302.local running macOS-12.4 + ./RunTests.sh --runtime-path /tmp/helix/working/AB2F0948/p ----- start Mon Jul 17 13:54:31 PDT 2023 =============== To repro directly: ===================================================== pushd . /tmp/helix/working/AB2F0948/p/dotnet exec --runtimeconfig LibraryImportGenerator.Unit.Tests.runtimeconfig.json --depsfile LibraryImportGenerator.Unit.Tests.deps.json xunit.console.dll LibraryImportGenerator.Unit.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing popd =========================================================================================================== /private/tmp/helix/working/AB2F0948/w/B0D909D1/e /private/tmp/helix/working/AB2F0948/w/B0D909D1/e Discovering: LibraryImportGenerator.Unit.Tests (method display = ClassAndMethod, method display options = None) Discovered: LibraryImportGenerator.Unit.Tests (found 183 of 188 test cases) Starting: LibraryImportGenerator.Unit.Tests (parallel test collections = on, max threads = 6) LibraryImportGenerator.UnitTests.Compiles.ValidateSnippetsWithMarshalType [SKIP] No current scenarios to test. ./RunTests.sh: line 168: 19058 Segmentation fault: 11 "$RUNTIME_PATH/dotnet" exec --runtimeconfig LibraryImportGenerator.Unit.Tests.runtimeconfig.json --depsfile LibraryImportGenerator.Unit.Tests.deps.json xunit.console.dll LibraryImportGenerator.Unit.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE /private/tmp/helix/working/AB2F0948/w/B0D909D1/e ----- end Mon Jul 17 13:55:17 PDT 2023 ----- exit code 139 ---------------------------------------------------------- exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped. ulimit -c value: 0 + export _commandExitCode=139 + _commandExitCode=139 + /usr/local/bin/python3 /tmp/helix/working/AB2F0948/p/reporter/run.py https://dev.azure.com/dnceng-public/ public 7153020 eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Im9PdmN6NU1fN3AtSGpJS2xGWHo5M3VfVjBabyJ9.eyJuYW1laWQiOiJjNzczZjJjMi01MTIwLTQyMDctYWZlMi1hZmFmMzVhOGJjMGEiLCJzY3AiOiJhcHBfdG9rZW4iLCJhdWkiOiJhNzQxMmE5My1hYzA2LTQyZTQtYWVkMi00OWRlNDQwZGQ2ZjAiLCJzaWQiOiJkYTI3MTVjMS01M2MxLTRkNjgtYjg1Ny00MThiNTE5MDMyOTMiLCJCdWlsZElkIjoiY2JiMTgyNjEtYzQ4Zi00YWJiLTg2NTEtOGNkY2I1NDc0NjQ5OzM0MjQxMyIsImpvYnJlZiI6ImRiMDUzZjk4LTYzZTctNDA5Ny1hZjk0LWJjZWU4NzRmNWQ1YTo1ZGE0ZjE2OS1mYWVjLTU4NjMtZmRmMy1hYzAwODU0NWI1ZTYiLCJwcGlkIjoidnN0ZnM6Ly8vQnVpbGQvQnVpbGQvMzQyNDEzIiwib3JjaGlkIjoiZGIwNTNmOTgtNjNlNy00MDk3LWFmOTQtYmNlZTg3NGY1ZDVhLmJ1aWxkLmxpYnJhcmllc190ZXN0X3J1bl9yZWxlYXNlX2NvcmVjbHJfb3N4X3g2NF9kZWJ1Zy5fX2RlZmF1bHQiLCJyZXBvSWRzIjoiIiwiaXNzIjoiYXBwLnZzdG9rZW4udmlzdWFsc3R1ZGlvLmNvbSIsImF1ZCI6ImFwcC52c3Rva2VuLnZpc3VhbHN0dWRpby5jb218dnNvOjZmY2M5MmU1LTczYTctNGY4OC04ZDEzLWQ5MDQ1YjQ1ZmIyNyIsIm5iZiI6MTY4OTYyNjA0MiwiZXhwIjoxNjg5NjM2MjQyfQ.xnAUFQj946LoXOEf-ZAATed5QLFMKT804-D57F50ZBYjIdsMH5Z_r4l8QTu_WA2zI-RjNMqaEVdhMVGcgXYeJ_4gp9jr0xZNGaW9KkvP-OoKP3y46d8mefmeTcB9QIvgCXpAOWLfJ2VBsGTBLT_QJxTqAKCisUODAzrI226VK2y6aYC6tr5JJVdfB_orXynX66vOWtzbxYlf9nitOwYQUu6cSlWMRtravEvMC5U3SowdiWotLOk2ETjBhUb0s7gvtREYrOg0pmX5MG-Rs8WFaLpjzGkrDPPTqOizKsH3prvyjwIvT8sNNTSXwfNhBAVH535dpwB7v-0VOpzwtupI5w 2023-07-17T20:55:17.714Z INFO run.py run(48) main Beginning reading of test results. 2023-07-17T20:55:17.714Z INFO run.py __init__(42) read_results Searching '/private/tmp/helix/working/AB2F0948/w/B0D909D1/e' for test results files 2023-07-17T20:55:17.716Z INFO run.py __init__(42) read_results Searching '/tmp/helix/working/AB2F0948/w/B0D909D1/uploads' for test results files 2023-07-17T20:55:17.716Z WARNING run.py __init__(55) read_results No results file found in any of the following formats: xunit, junit, trx 2023-07-17T20:55:17.717Z INFO run.py packing_test_reporter(30) report_results Packing 0 test reports to '/tmp/helix/working/AB2F0948/w/B0D909D1/e/__test_report.json' 2023-07-17T20:55:17.717Z INFO run.py packing_test_reporter(33) report_results Packed 1551 bytes + /usr/local/bin/python3 /tmp/helix/working/AB2F0948/p/gen-debug-dump-docs.py -buildid 342413 -workitem LibraryImportGenerator.Unit.Tests -jobid 70c62cc8-b91f-419a-a157-04f606e76ff0 -outdir /tmp/helix/working/AB2F0948/w/B0D909D1/uploads -templatedir /tmp/helix/working/AB2F0948/p -dumpdir /cores -productver 8.0.0 Did not find dumps, skipping dump docs generation. + exit 139 ['LibraryImportGenerator.Unit.Tests' END OF WORK ITEM LOG: Command exited with 139] ``` ### Known issue validation **Build: :mag_right:** https://dev.azure.com/dnceng-public/public/_build/results?buildId=342413 **Error message validated:** `exit code 139 means SIGSEGV Illegal memory access. Deref invalid pointer, overrunning buffer, stack overflow etc. Core dumped.` **Result validation: :x:** Known issue did not match with the provided build. **Validation performed at:** 7/17/2023 9:47:59 PM UTC ### Report #### Summary |24-Hour Hit Count|7-Day Hit Count|1-Month Count| |---|---|---| |0|0|0|
Author: carlossanlop
Assignees: -
Labels: `area-System.Runtime.InteropServices`, `blocking-clean-ci`, `runtime-coreclr`, `source-generator`, `test-failure`, `Known Build Error`
Milestone: -
AaronRobinsonMSFT commented 9 months ago

@jkoritzinsky and @jtschuster This seems a bit odd. The generator is deterministic, so why aren't we seeing this more regularly or in the recent CI? Perhaps this is the non-determinism for A/V, but I'm surprised we haven't seen this before.

jtschuster commented 9 months ago

The unit tests are all managed code I think. Was this something with us, or something with the runtime?

AaronRobinsonMSFT commented 9 months ago

Was this something with us, or something with the runtime?

That is the question. My initial guess here would be we are generating something bad.

jtschuster commented 9 months ago

Generating something bad shouldn't cause a segfault unless we run the code, right? And I don't think we run generated code in the unit tests.

AaronRobinsonMSFT commented 9 months ago

And I don't think we run generated code in the unit tests.

Ah. I thought we run some of that code. Okay.

jkoritzinsky commented 9 months ago

Yeah we don't run any generated code in the unit tests. We only generate the code and then use the Roslyn APIs to inspect it. We only run the code in the "integration" tests (ie LibraryImportGenerator.Tests)

carlossanlop commented 9 months ago

I am seeing this failure affecting also ComInterfaceGenerator.Unit.Tests, and in release/8.0. Could it be the same root cause?

jkoritzinsky commented 9 months ago

Yes, it is possible that this is the same failure.

agocke commented 6 months ago

I've successfully got a Native AOT crash dump, with working symbols, on Linux for this. Download https://microsoft-my.sharepoint.com/:u:/p/angocke/Eaj2iJxJzItEgs8mngNfxZ0B081M0ipe2Q9ucGKiK80SFQ?e=hfCpNT for a zip with all the necessary bits

agocke commented 6 months ago

@AaronRobinsonMSFT @jtschuster Could you take a look while Jeremy's out?

AaronRobinsonMSFT commented 6 months ago

@agocke The above link is for a failure in System.Numerics.Vectors.Tests. Is this really realted to the LibraryImport source generator?

agocke commented 6 months ago

oh, do those tests not use the generor? OK, this must just be catching extra stuff

agocke commented 6 months ago

Adjusted the error message, hopefully this will catch only LibraryImportGenerator segfaults now

jeffschwMSFT commented 2 days ago

removing blocking-clean-ci as it has not failed in 30 days

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
agocke commented 2 days ago

I think it's worth closing this. Whatever was causing the LibraryImportGenerator failures looks more likely to be a runtime instability problem that has been resolved.