Closed tmds closed 3 years ago
@tmds I believe the issue doesn't occur if you have 4kB large memory pages, only when the distro has larger pages, the block with the cookie "leaks" into code.
@tmds @janvorli is any fix required here for .net 6?
is any fix required here for .net 6?
Yes.
The cookie issue still causes our builds to fail from the start.
Once that is fixed, and it has rippled into the SDK that gets used from the .dotnet
folder, I suspect we'll see the NullReferenceExceptions
again.
Our plan is to build .NET 6 for arm64, but this issue needs to be resolved for that.
I've looked at the problem but I couldn't figure out the root cause. I think it is in the kernel. I can ask kernel engineers to have a look, but they'll want al better reproducer.
@janvorli Now that preview6 is wrapping up, any idea on when you'll be able to take another look at this?
I have created a PR in arcade to fix rootfs build for Alpine 3.9. After consulting it with @mthalman, I am going to get in my original change to the docker images and keep building for Alpine on 3.9 for now and move to using the 3.13 after the preview 7. Then I can get in my change to use the lld linker and start looking into the null reference issues. We still have null reference issues on Apple Silicon, so chances are they are related.
@omajid, @tmds I have tried to run all coreclr pri 1 tests on RHEL 8 with 64kB page size using the latest main and no tests were failing with NullReferenceException anymore.
I had to run the tests manually (enumerating all of the related .sh files and running them with added -coreroot argument), since the Preview 6 SDK / runtime that's normally used to execute xunit doesn't have the fix for the GS cookie mapping issue that I've fixed recently by switching to the lld linker.
Out of all the coreclr pri 1 tests, 10052 succeeded, 29 failed and 3 timed out. 15 of the failures are Unhandled exception. System.InvalidProgramException: Vararg calling convention not supported.
, few were caused by the testing methodology (some tests can properly run only via xunit) and the remaining failures are of unknown kind (but no crashes, just error codes meaning the test didn't pass as expected).
So I am closing this issue.
Thanks, @janvorli ! Any idea when a fix might land such that building runtime works out of the box? Maybe in a month or so?
I have tried to run all coreclr pri 1 tests on RHEL 8 with 64kB page size using the latest main and no tests were failing with NullReferenceException anymore.
I'm not sure you're running tests in a way that shows the NullReferenceException
issue is fixed.
When I ran these tests before none throwed NullReferenceException
(https://github.com/dotnet/runtime/issues/43349#issuecomment-757922450). The exceptions happend as part of running the library tests.
The NullReferenceExceptions
were happening before we started hitting the GSCookie
issue. It's clear https://github.com/dotnet/runtime/pull/52244 fixes the GSCookie
issue (https://github.com/dotnet/runtime/issues/43349#issuecomment-807867988), but I don't understand how it fixes the NullReferenceExceptions
.
I believe the NullReferenceException was fixed by another change, #53510. That was what was causing those on macOS arm64 and it was not Apple specific.
Any idea when a fix might land such that building runtime works out of the box?
The fix will be part of RC1, which will come after preview 7.
I believe the NullReferenceException was fixed by another change, #53510. That was what was causing those on macOS arm64 and it was not Apple specific.
Great! Thank you for the reference.
In our CI builds, each run on RHEL8 arm64 shows
NullReferenceExceptions
in the log.On the same arm64 host with a Fedora 32 VM there are no
NullReferenceExceptions
. When I build and test on another RHEL8 arm64 machine,NullReferenceExceptions
also show up in unexpected places.Some example stack traces from CI log:
Microsoft.Extensions.Hosting tests
System.Linq.Parallel.Tests
System.Text.Json.Serialization.Tests
@janvorli I don't know how to debug this, can you take a look? or give me some pointers?
cc @omajid