Inconsistent file version generation between big and little endian CPU architectures

Techn0core commented 3 years ago

It looks like the git commit hash is having it's digits 1&2 swapped with 3&4, and then converted to base 10.

A commit starting with a1b2 becomes x.y.z.45729, which converted back to base 16 is b2a1. Do I have that right?

Personally I think it would be very convenient to just convert it without the swap, so we can use a simple base 16 to 10 conversion to get the first 4 digits of the source commit hash, and advertise this option in the readme.md. Was there a reason for this swapping? Could it be added as a setting to version.json?

Thanks,

AArnott commented 3 years ago

That's an interesting idea. First of all, you know you can use the nbgv tool to convert from a version number to a commit ID, yes? nbgv get-commits 1.2.3.45729 will do that for you.

The other slight complication is that this commit ID in the 4th integer position is not exactly 16-bits -- because 0xffff is not an allowed value. So there is always the ambiguity when you see 65534 as the last integer that it may mean the original commit ID started with fffe or ffff, since we force an ffff value to be represented the same as fffe given the WinPE file version limitation. nbgv get-commits already takes this into account when matching the commit id.

When converting the commit ID to a decimal number, we take the first 16 bits, subtract one if its value is 0xffff, and then represent it as a ushort value directly. On little-endian systems, a 16-bit value is stored in memory backwards, so if the commit ID starts with 0xabcd, a ushort with that exact same memory will interpret the number as 0xcdab (52,651 in decimal). On a big-endian system, I suspect the same 16-bit lead to the commit ID would end up creating a decimal value of 43,981.

That difference between endian systems is a problem, since NB.GV strives for deterministic versioning. So for the sake of a consistent version generation no matter which CPU is running NB.GV, it does seem like a standard should be adopted, and for the use case you mention, choosing big-endian makes sense. That is to say, we should swap the two bytes around if on a little-endian system before interpreting as a ushort.

AArnott commented 3 years ago

One complication is that this will be a breaking change for version=commit matching, since the nbgv tool will now be interpreting the 4th integer in the version as a big endian representation of the first two bytes of the commit ID instead of little endian, which means no version will match that was produced with an older version of NB.GV.

I may have to adjust the tool to recognize both for sake of backward compatibility.

dotnet / Nerdbank.GitVersioning

Inconsistent file version generation between big and little endian CPU architectures #637