Closed MichalStrehovsky closed 1 year ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak See info in area-owners.md if you want to be subscribed.
Author: | MichalStrehovsky |
---|---|
Assignees: | - |
Labels: | `area-CodeGen-coreclr` |
Milestone: | - |
Will take a look. It's possible its related to the big ZMM support PR that went in yesterday
Sorry for the delay. I believe this was fixed by one of @kunalspathak PRs fixing various issues that had cropped up on Arm64.
It is not repro'ing anymore and was not ZMM related given it was Arm64
There is a test exclusion for this in the repo.
@tannergooding this is still broken in main, see CI results in #86366:
Generating native code
ILC: /__w/1/s/src/coreclr/jit/codegencommon.cpp:3930
ILC: Assertion failed 'regArgTab[argNum].slot == 2' in 'System.Collections.Generic.GenericEqualityComparer`1[System.Numerics.Plane]:GetHashCode(System.Numerics.Plane):int:this' during 'Generate code' (IL size 24; hash 0x7039dd06; FullOpts)
@MichalStrehovsky - I think we should keep the issue open and mark it with test-disabled
or equivalent.
Yeah, github autoclosed it because one of the commits in the PR still referenced this issue.
@MichalStrehovsky, there are other failures blocking this from repro'ing: https://github.com/dotnet/runtime/pull/90184
For reference, the failure:
EXEC : error : Failed to load assembly 'SQLitePCLRaw.core' [/__w/1/s/src/libraries/System.Private.CoreLib/tests/IntrinsicsInSystemPrivatecoreLibAnalyzer.Tests/IntrinsicsInSystemPrivateCoreLib.Tests.csproj]
##[error]EXEC(0,0): error : (NETCORE_ENGINEERING_TELEMETRY=Build) Failed to load assembly 'SQLitePCLRaw.core'
Internal.TypeSystem.TypeSystemException+FileNotFoundException: Failed to load assembly 'SQLitePCLRaw.core'
at Internal.TypeSystem.ThrowHelper.ThrowFileNotFoundException(ExceptionStringID id, String fileName) in /_/src/coreclr/tools/Common/TypeSystem/Common/ThrowHelper.cs:line 35
at Internal.TypeSystem.ResolutionFailure.Throw() in /_/src/coreclr/tools/Common/TypeSystem/Common/ResolutionFailure.cs:line 105
at Internal.TypeSystem.Ecma.EcmaModule.GetObject(EntityHandle handle, NotFoundBehavior notFoundBehavior) in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaModule.cs:line 432
at Internal.TypeSystem.Ecma.EcmaSignatureParser.ResolveHandle(EntityHandle handle) in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaSignatureParser.cs:line 61
at Internal.TypeSystem.Ecma.EcmaSignatureParser.ParseTypeImpl(SignatureTypeCode typeCode) in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaSignatureParser.cs:line 143
at Internal.TypeSystem.Ecma.EcmaSignatureParser.ParseType(SignatureTypeCode typeCode) in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaSignatureParser.cs:line 98
at Internal.TypeSystem.Ecma.EcmaSignatureParser.ParseType() in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaSignatureParser.cs:line 309
at Internal.TypeSystem.Ecma.EcmaSignatureParser.ParseFieldSignature() in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaSignatureParser.cs:line 462
at Internal.TypeSystem.Ecma.EcmaSignatureParser.ParseFieldSignature(EmbeddedSignatureData[]& embeddedSigData) in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaSignatureParser.cs:line 471
at Internal.TypeSystem.Ecma.EcmaField.InitializeFieldType() in /_/src/coreclr/tools/Common/TypeSystem/Ecma/EcmaField.cs:line 104
at Internal.TypeSystem.MetadataFieldLayoutAlgorithm.ComputeInstanceLayout(DefType defType, InstanceLayoutKind layoutKind) in /_/src/coreclr/tools/Common/TypeSystem/Common/MetadataFieldLayoutAlgorithm.cs:line 37
at Internal.TypeSystem.DefType.ComputeInstanceLayout(InstanceLayoutKind layoutKind) in /_/src/coreclr/tools/Common/TypeSystem/Common/DefType.FieldLayout.cs:line 456
at ILCompiler.CompilerTypeSystemContext.EnsureLoadableTypeUncached(TypeDesc type) in /_/src/coreclr/tools/Common/Compiler/CompilerTypeSystemContext.Validation.cs:line 317
at ILCompiler.CompilerTypeSystemContext.EnsureLoadableType(TypeDesc type) in /_/src/coreclr/tools/Common/Compiler/CompilerTypeSystemContext.Validation.cs:line 74
at ILCompiler.DependencyAnalysis.NodeFactory.CreateConstructedTypeNode(TypeDesc type) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/NodeFactory.cs:line 553
at Internal.TypeSystem.LockFreeReaderHashtable`2.CreateValueAndEnsureValueIsInTable(TKey key) in /_/src/coreclr/tools/Common/TypeSystem/Common/Utilities/LockFreeReaderHashtable.cs:line 562
at ILCompiler.DependencyAnalysis.ReflectionInvokeMapNode.AddSignatureDependency(DependencyList& dependencies, NodeFactory factory, TypeDesc type, String reason) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/ReflectionInvokeMapNode.cs:line 111
at ILCompiler.DependencyAnalysis.ReflectedFieldNode.GetStaticDependencies(NodeFactory factory) in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/DependencyAnalysis/ReflectedFieldNode.cs:line 96
at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.GetStaticDependenciesImpl(DependencyNodeCore`1 node) in /_/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/DependencyAnalyzer.cs:line 182
at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.GetStaticDependencies(DependencyNodeCore`1 node) in /_/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/DependencyAnalyzer.cs:line 222
at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ProcessMarkStack() in /_/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/DependencyAnalyzer.cs:line 257
at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ComputeMarkedNodes() in /_/src/coreclr/tools/aot/ILCompiler.DependencyAnalysisFramework/DependencyAnalyzer.cs:line 308
at ILCompiler.ILScanner.ILCompiler.IILScanner.Scan() in /_/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/ILScanner.cs:line 140
at ILCompiler.Program.<Run>g__RunScanner|4_0(<>c__DisplayClass4_0&) in /_/src/coreclr/tools/aot/ILCompiler/Program.cs:line 468
at ILCompiler.Program.Run() in /_/src/coreclr/tools/aot/ILCompiler/Program.cs:line 448
at ILCompiler.ILCompilerRootCommand.<>c__DisplayClass221_0.<.ctor>b__0(ParseResult result) in /_/src/coreclr/tools/aot/ILCompiler/ILCompilerRootCommand.cs:line 277
Is this a known issue? CC. @agocke
@MichalStrehovsky, there are other failures blocking this from repro'ing: #90184
The explanation is at https://github.com/dotnet/runtime/pull/90184#issuecomment-1698521105.
Haven't been able to repro this locally. Must be doing something wrong.
Haven't been able to repro this locally. Must be doing something wrong.
Here's the workflow I'd use for this (IIRC this was arm64 linux specific so need to build on linux):
(The UseNativeAotCoreLib=true might just be a cargo cult I do)
If it reproes and you'd like to move investigation to Windows, edit System.Numerics.Vectors.Tests.csproj and add <ItemGroup><IlcArg Include="--make-repro-path:/some/path/to/a/directory"></ItemGroup>
. This will create a zip under /some/path/to/a/directory - copy it to your Windows machine, unpack and run ILC with the RSP file in the ZIP.
Thanks, I'll give this a try on my volterra + wsl2.
The -arch arm64
will make it take the crossbuild paths, so volterra shouldn't be needed, just a linux machine (the build machines we use in the lab are not arm64 machines either, only the Helix executors)
Reproed just fine on the volterra so I'm set for now.
Generating native code
ILC: /home/andya/repos/runtime0/src/coreclr/jit/codegencommon.cpp:3944
ILC: Assertion failed 'regArgTab[argNum].slot == 2' in 'System.Collections.Generic.GenericEqualityComparer`1[System.Numerics.Plane]:GetHashCode(System.Numerics.Plane):int:this' during 'Generate code' (IL size 24; hash 0x7039dd06; FullOpts)
This method gets 1 arg apparently passed in 4 float registers:
; V01 arg1 simd16 HFA(float) multireg-arg <System.Numerics.Plane>
Arg #1 passed in register(s) d0,d1,d2,d3
I think the jit gets confused during prolog generation and somehow fails to handle it as an HFA and so falls into a code path expecting the arg to be passed in 2 registers, not 4. The slot num when we assert is 3 which looks plausible for a 4 element HFA.
(regArgElem [9]) regArgTab = {
[0] = {
varNum = 1
type = TYP_FLOAT
trashBy = 0
slot = '\x01'
stackArg = false
writeThru = false
processed = false
circular = false
}
[1] = {
varNum = 1
type = TYP_FLOAT
trashBy = 0
slot = '\x02'
stackArg = false
writeThru = false
processed = false
circular = false
}
[2] = {
varNum = 1
type = TYP_FLOAT
trashBy = 0
slot = '\x03'
stackArg = false
writeThru = false
processed = false
circular = false
}
[3] = {
varNum = 1
type = TYP_FLOAT
trashBy = 0
slot = '\x04'
stackArg = false
writeThru = false
processed = false
circular = false
This method is pretty convoluted so will have to spend more time debugging.
; V01 arg1 [V01,T01] ( 4, 4 ) simd16 -> d3 HFA(float) multireg-arg ld-addr-op single-def <System.Numerics.Plane>
LSRA assigns reg d3
for V01
and the arg comes in in d0-d3
. So it self-conflicts in an odd way that the prolog codegen can't handle.
@dotnet/jit-contrib any ideas here?
The register naming here is very confusing in our dumps. The incoming args are 32-bit floats, so we should say they are passed in s0,s1,s2,s3
(not d0/d1/d2/d3). And the register assignment is of the entire value, so q3
(not d3).
Or if we don't know the sizes maybe we should say v0,v1,v2,v3
and v3
instead?
any ideas here?
Maybe this is a new self-conflict that we haven't seen before due to change in HFA or struct handling or ABI or struct promotion?
I don't think there is an actual liveness conflict here, though in similar cases there could be. But still not sure I understand all that is going on in this code.
We get the args in s0-s3
and they all need to end up in say v3
. Currently the code doesn't detect any circularity (which I think is right) and when it goes to handle s0
it sees that s3
is live, so it defers ("Cannot trash a currently live register argument."). For s1
it doesn't do any conflict check and defers since it expects this part of the hfa to be handled when the first part is. For s2
it should do likewise but asserts that there is just one other part (the code is probably just expecting the "simd passed in 2 registers" behavior here).
I think the right logic is more like: if the dest reg is one of the hfa regs, and the hfa reg corresponding to the dest reg is already in the right spot in the dest reg, just insert the other hfa regs. Else spill or insert the "misplaced" hfa reg, then insert the other regs. So basically, a finer-grained check to see if the spot this hfa arg reg needs to occupy in the dest is live, not just if any part of the dest reg is live.
I have a hacky version where I comment out the assert block and then update the "Cannot trash a currently live register argument" part, so that if it sees a conflict, it then tries to screen dependence for the HFA cases specially. On this repro case it produces plausible looking code but it doesn't handle the case where say the HFA was passed in s1-s4
but ends up in v3
where s3
needs to be moved within v3
or copied elsewhere first.
G_M8953_IG01:
IN0008: 000000 stp fp, lr, [sp, #-0x20]!
IN0009: 000004 mov fp, sp
IN000a: 000008 mov v3.s[0], v0.s[0]
IN000b: 00000C mov v3.s[1], v1.s[0]
IN000c: 000010 mov v3.s[2], v2.s[0]
G_M8953_IG02:
IN0001: 000014 mov v0.16b, v3.16b
I also tried coming up with a non-AOT repro but no luck so far.
I'll grab an SPMI collection and link it here so if anyone else wants to poke at this it is a bit simpler to do.
I think the right logic is more like: ...
Sounds right. But does the JIT have "insert into the right place" logic already? Could it do something simple and spill the entire HFA to stack, then reload v3/q3 from the stack location?
But does the JIT have "insert into the right place" logic already?
Maybe not for slot 0, but it does have something for the other slots -- need to check if the opcode we use for slot 0 is a partial or total write; we need it to be partial since some bits in the destination are live (just not the whole thing). Currently my hack adds some special casing for slot 0 to make sure we insert.
Also I am not confident we get the sizes right for the inserts; we always use EA_4BYTE
which works fine for float
-- perhaps we never see HFAs of doubles or other sizes? But seems like we can.
Could it do something simple and spill the entire HFA to stack, then reload v3/q3 from the stack location?
I assume it could, yeah. I am not sure where best to integrate that into what's there now.
Here's the SPMI collection: fail.mc.txt
Seen in https://dev.azure.com/dnceng-public/public/_build/results?buildId=197177&view=logs&j=2f6a7d26-0d60-5ade-d191-981fe0847989&t=950a5a68-ae8a-5bb4-8cad-fb293bb99dc4&l=6041
(linux.arm64.Checked leg of runtime-extra-platforms)
Looks to be related to hardware intrinsics. Cc @tannergooding