Open uweigand opened 3 months ago
Tagging subscribers to this area: @dotnet/area-extensions-configuration See info in area-owners.md if you want to be subscribed.
Attaching the complete log file of the error which got reproduced.. NET_runtime_Binder_Test_errors.txt
@steveharter is this something you can help looking at?
@RikkiGibson PTAL
This is similar to the question raised in https://github.com/dotnet/runtime/issues/101079#issuecomment-2221225812 where line endings can also vary based on platform.
I think we expect that if the file content is the same, these hashes should be the same.
@CyrusNajmabadi is it expected that the result of GetContentHash()
will vary depending on endianness?
It should be based entirely on the character content of the source files. It would be good if both were dumped out (literally every character and its integral value), and the two were compared.
If they have the same characters, then GetContentHash should be the same.
If they have different characters, then this is by design.
Are we sure this isn't an issue around \r vs \r\n?
It should be based entirely on the character content of the source files. It would be good if both were dumped out (literally every character and its integral value), and the two were compared.
If they have the same characters, then GetContentHash should be the same.
I believe they have the same sequence of characters but not the same sequence of bytes, because each character has two bytes, and the order of those two bytes within a single character is endian-dependent. The current implementation of GetContentHash only returns the same value on the same sequence of bytes, not characters.
@uweigand I think it makes sense to disable these tests for now as any possible Roslyn change will take a while to come over - what CI pipelines do I need to run to verify? (is it "runtime-extra-platforms"?)
Is there a way to ensure hashes like this are platform-independent without introducing a big byte swap performance overhead on big endian?
@uweigand I think it makes sense to disable these tests for now as any possible Roslyn change will take a while to come over - what CI pipelines do I need to run to verify? (is it "runtime-extra-platforms"?)
The runtime-community pipeline runs on s390x, however this is unfortunately not operational at the moment. @directhex is working on bringing this back up again. For the time being, I'm happy to test any suggested fixes on our machines.
@uweigand I put up a PR to disable these tests on https://github.com/dotnet/runtime/pull/105485. Can you give it a try to make sure all of the tests pass now? Thanks
@uweigand I put up a PR to disable these tests on #105485. Can you give it a try to make sure all of the tests pass now? Thanks
Yes, this makes all tests pass again. Thanks!
It should be based entirely on the character content of the source files. It would be good if both were dumped out (literally every character and its integral value), and the two were compared.
If they have the same characters, then GetContentHash should be the same.
@CyrusNajmabadi my take is that Roslyn may want to calculate the hash on character content, not byte contents. Also, if that is done, and we take a perf hit, does it make sense at that time to normalize line endings so that files saved under Windows\Linux\OSX all have the same hash?
At this point the line-endings issue is something we are comfortable with it behaving as it currently does. The file content is different, so the hash is different.
The endianness issue may be a real bug from our POV though. The content is the same character-for-character, it is just laid out differently in memory due to an architectural difference. It would be good if we could adjust the way the content hash is calculated on big-endian, so that it matches little-endian, without sacrificing too much perf.
I will also say that while the intended design is to allow [InterceptsLocation]
attributes to be written out to disk and consumed on another machine etc. it is not actually a mainline case for us. So, it is prone to hit bugs like these.
It might make sense to adjust the creation+consumption of these baselines to basically be insensitive to exactly what attribute arguments are used, and to add (or rely on pre-existing) functional verification that the compilation is successful and the semantics match what you expect.
@jaredpar for visibility.
does it make sense at that time to normalize line endings so that files saved under Windows\Linux\OSX all have the same hash?
Under no circumstances would we ever do this. Line ending is content and affects the runtime behavior of a program. Two programs with different line endings should not have the same hash as that would indicate they were identical.
@CyrusNajmabadi @RikkiGibson is there a tracking Roslyn issue?
Do you expect this to be fixed in the v9.0 timeframe or should I move this to 10.0?
No, feel free to file an issue on Roslyn for the endianness behavior. I am guessing a fix in 9.0 is unlikely at this point.
Roslyn issue: https://github.com/dotnet/roslyn/issues/74682
I am guessing a fix in 9.0 is unlikely at this point.
I am moving this to v10.
Since https://github.com/dotnet/runtime/pull/104180 was merged, the following test cases are failing on s390x:
The symptom is in all cases of the type:
It looks like the difference is in the
InterceptableLocationData
, which is provided by roslyn'sMicrosoft.CodeAnalysis.CSharp.InterceptableLocation1.Data
. Specifically, the differences are in thechecksum
field, which is computed via theSourceText:GetContentHash()
routine. From looking this this routine, it seems to simply re-interpret a string (char sequence) as byte array, see https://github.com/dotnet/roslyn/blob/aea9e82da403c397265f7fd0fefee5ebbb886179/src/Compilers/Core/Portable/Text/SourceText.cs#L645This seems to imply that the hash value of the same source code will be different if computed on a big-endian vs. a little-endian machine (because the byte representation of each char will have its two bytes swapped).
Now my question would be: is the content hash supposed to be the same on big- and little-endian machines? If yes, that code in roslyn needs to be fixed. If no, then the ConfigurationBindingGeneratorTests will have to take that endian difference into account one way or the other.
CC @steveharter @tarekgh @directhex @steveisok FIY @giritrivedi @saitama951 @omajid @tmds