Unity-Technologies / Unity.Mathematics

The C# math library used in Unity providing vector types and math functions with a shader like syntax
Other
1.38k stars 156 forks source link

ECSB-529 ECSB-506 Fix arm hash crash #241

Closed unpacklo closed 1 year ago

unpacklo commented 1 year ago

On arm 32 bit, IL2CPP builds can crash due to math.hash attempting to access uint at unaligned addresses. This PR fixes the problem across all platforms by causing all reads to go through a 32 bit little endian read function which reads byte by byte and then constructs the 32 bit value.

I manually checked this fix on a test project with a Samsung Galaxy S10e (SM-G970U).

unpacklo commented 1 year ago

I do have some concerns about a possible performance regression on x64 but I heard through the grapevine that unaligned loads/stores are no longer performance penalized on recent Intel chips. Not sure about AMD. In practice, my Intel Core i7-7820X has no performance difference.

Does anyone know the story on AMD side? Maybe @tgjones knows?

unpacklo commented 1 year ago

@cdwfs take a gander at this glorious assembly diff https://gist.github.com/unpacklo/864df87d23bdbdee242d3212123cf4d1

There are no appreciable code diffs here. There are a few rearrangements of some code lines and debug info but that seems to be coming from my switch to calling rol instead of the handwritten rotate that was in the hash code before.

Over17 commented 1 year ago

@cdwfs take a gander at this glorious assembly diff https://gist.github.com/unpacklo/864df87d23bdbdee242d3212123cf4d1

There are no appreciable code diffs here. There are a few rearrangements of some code lines and debug info but that seems to be coming from my switch to calling rol instead of the handwritten rotate that was in the hash code before.

The rol 's where in before and after - I can only see instructions rearranged

Over17 commented 1 year ago

@unpacklo you checked on a Samsung with an Armv7 build, right? just checking to be sure.

Over17 commented 1 year ago

@unpacklo could you share the arm assembly diff? most interested in arm64, so that it does not regress. (I guess you can easily get it from the Burst inspector)

unpacklo commented 1 year ago

I spent the last few days debugging the il2cpp debug and release builds of this crash and as a result, I have a clear understanding of what's going on.

4da3001732c904b4773d7d06c4315c1f

First off, this is the android armv7a neon32 release build crashing on device:

image

Notice the ldm r0, {r3, r4, r5, r7} instruction which is an aligned load. However, r0 is in fact an unaligned address:

image

I mean, that's cool and all so at least that part is confirmed. However, I noticed during my testing that il2cpp debug builds were running just fine. Time to find out how.

The debug disassembly looks a bit different:

image

Here, we are stopped right before the load of interest vld1.32 {d16, d17} [r0] and r0 has the unaligned pointer 0xc71a12d9 (I only have the var window, not the register read from lldb but rest assured, r0 is unaligned):

image

I'm a bit confused at this point because I'm vaguely familiar with arm and I thought this was an instruction that required alignment, but somehow this seems to be fine. Well, the almighty Google brought me to this insightful stackoverflow:

image

Screen_Shot_2018-10-25_at_11 02 15_AM

Ok, I have to find out more...

image

image

image

Screen_Shot_2018-10-25_at_11 02 15_AM-1

So basically, if you've got SCTLR.A == 0 and you use vld1.32 without any alignment hint, you have an unaligned load which is why the debug build was not crashing.

I'm still working on a fix for this, more info to come tomorrow.

Over17 commented 1 year ago

Important is to not put a penalty on Arm64. Armv7 is on its way to sunset, while Arm64 is to stay with us for many decades.

I think my last commit will achieve this: https://github.com/Unity-Technologies/Unity.Mathematics/pull/241/commits/7fb7c936e9ff1836c5b3d1d9b4a1bb9343c028cd

Basically, I make the code go through the slow path only for 32 bit arm. All other platforms continue to use the original code as before which assumes the platform supports unaligned loads.

There is a bit of a wrinkle here in that we are relying on the fact that AArch64 allows for unaligned loads/stores to normal memory but still requires alignment for any pointer to device memory. If the user somehow got a pointer to device memory and called this function, we will still crash but I think I can consider this an edge case and move on until someone tells me they're hashing device memory.

unpacklo commented 1 year ago

Important is to not put a penalty on Arm64. Armv7 is on its way to sunset, while Arm64 is to stay with us for many decades.

I think my last commit will achieve this: 7fb7c93

Basically, I make the code go through the slow path only for 32 bit arm. All other platforms continue to use the original code as before which assumes the platform supports unaligned loads.

There is a bit of a wrinkle here in that we are relying on the fact that AArch64 allows for unaligned loads/stores to normal memory but still requires alignment for any pointer to device memory. If the user somehow got a pointer to device memory and called this function, we will still crash but I think I can consider this an edge case and move on until someone tells me they're hashing device memory.

I think I accidentally edited Yury's comment. In any case, I'm trying to clean up CI and verifying on the customer bug report project before checking in.

unpacklo commented 1 year ago

So, to finally close the loop, I was wondering why arm 64 bit does not crash and the reason seems to be this:

image

So AArch64 says that Normal memory can support unaligned accesses. I'm sure there's a penalty or something regarding that and could be optimized in the future, but that's a task for future me. It also mentions SCTLR_ELx.A where the x corresponds to the processor privilege level.

The key distinction here is Normal memory and Device memory. Normal memory is described here:

image

Device memory is described here:

image

The key distinction is that Device memory can have side effects from simply accessing the memory contents. I think it's safe to say that no one would ever want to be hashing Device memory so I am comfortable with assuming that no one would ever send the hash function a Device memory pointer.

Finally, the value of SCTLR_ELx.A. What is the value set to and how does it get set? Well, these are privileged registers that get set by the OS. I've scrubbed around and I can't quite nail down what android does but empirically, they're doing EL0.A = 0 so unaligned accesses are valid.

unpacklo commented 1 year ago

I finally got my hands on the customer bug report test project and I've reproduced the issue. I've verified that this PR fixes the crash on arm32 in the customer test project.

Over17 commented 1 year ago

Great stuff, thanks Dale!

Finally, the value of SCTLR_ELx.A. What is the value set to and how does it get set? Well, these are privileged registers that get set by the OS.

You are correct, those are priviliged registers so setting them is not possible from the user space, unless I'm mistaken. We can consider this irrelevant.

RE: the code, I think the slow path will trigger for x86 (intel 32 bit) Android too, which Unity supports only for Chromebooks. I think it's fine though, x86 ABI is even unsupported in latest NDKs, so the share of affected devices is super low.