cr-marcstevens / sha1collisiondetection

Library and command line tool to detect SHA-1 collision in a file
Other
1.3k stars 179 forks source link

commit 33a694a9ee1b broke all little endian platforms on *BSD #34

Closed n-soda closed 7 years ago

n-soda commented 7 years ago

Little endian platforms on all *BSD variants are broken since commit 33a694a9ee1b, because _BIG_ENDIAN is always defined even on little endian platforms.

problem found by nonaka at NetBSD.org

shumow commented 7 years ago

@n-soda Note the Macro: SHA1DC_FORCE_LITTLEENDIAN -- I would prefer if we move to use this, as opposed to add cases to these checks, which we should probably simplify greatly (or remove.) Is there anyway that you can simply modify the Makefile that you use, or add to the Makefile we use, to define this macro instead?

cr-marcstevens commented 7 years ago

I just committed (a24eef5) a different fix: If any of _BYTE_ORDER, BYTE_ORDER or __BYTE_ORDER is defined then it will check its value against to corresponding _BIG_ENDIAN, otherwise it checks for the existance of any known macro that indicates a big endian platform.

I hope this makes endianness selection more correct on the platforms with issues. Can you please confirm if this fixes your issues on *BSD ?

me-and commented 7 years ago

Cygwin has exactly the same problem (which I discovered via Git including this library), and I can confirm that a24eef5 fixes things there.

avar commented 7 years ago

@n-soda, @me-and : Could you please test my PR #36. It hopefully fixes this issue & more by mainly relying on GCC/Clang detection rather than more fragile macros that may differ between platforms like Solaris & NetBSD.

avar commented 7 years ago

My PR #36 has been merged, so this shouldn't be an issue anymore, we don't check _BIG_ENDIAN at all now. I don't have permission to close this PR though, @cr-marcstevens ?

n-soda commented 7 years ago

I'm sorry for the delay of the repsonse. unfortunately, latest master (3f14d1bb) doesn't work correctly on some big endian architectures with modern BSDs (e.g. m68k, some of power/powerpc machines, Super-H, HP PA). So, could you re-add the "defined(_BYTE_ORDER) && defined(_BIG_ENDIAN)" test again? modern BSDs need this.

historically, 4BSD originally defined BYTE_ORDER/BIG_ENDIAN/LITTLE_ENDIAN macros (these symbols are still defined in some cases on modern BSDs for backward compatibility, but they are deprecated), and then modern BSDs defined the _BYTE_ORDER/_BIG_ENDIAN/_LITTLE_ENDIAN macros (one underscore) to avoid namespace pollution, Linux followed similar convention, but used two underscores (i.e. BYTE_ORDER/__BIG_ENDIAN/LITTLE_ENDIAN) instead. So, both the one underscore version and the two underscores version are needed at least. It seems Cygwin uses same convention with modern *BSDs, accourding to https://cygwin.com/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/include/machine/endian.h;h=34a5726e6871a6c06b420790e9305c3236ee1cb8;hb=HEAD

avar commented 7 years ago

@n-soda: Makes sense, but just for my own sanity, what compiler/version are you using on those BSDs? I had someone test it on Cygwin and according to them it worked, which I think is because they're using a GCC newer than 4.6.0.

avar commented 7 years ago

@n-soda: Also, can you tell me if 56ab30c works for you?

n-soda commented 7 years ago

@avar 56ab30c doesn't work, because it treats all *BSD and newlib as big endian even if the architecutre is little endian. About gcc version, latest NetBSD (NetBSD-7 and later) is OK because it is using gcc-4.8.5, but NetBSD-6 is NOT OK, because NetBSD-6 is using gcc-4.5.3 and it's still supported, so we cannot reply on the gcc specific macros. Also, I guess some embeded systems which are using newlib (newlib is used by Cygwin, but also by some Linux-based embeded systems) may be using older gcc too.

FWIW, ade6b5d8 in my repo works for me. Does this work on Solaris? (I can test it on Solaris 9/sparc with very old gcc, but not on latest Solaris)

n-soda commented 7 years ago

@avar According to http://man.openbsd.org/gcc-local.1 , OpenBSD is using gcc-3.3.6 or 4.2.1, because they won't accept GPL version 3. So, OpenBSD is NOT OK either about the gcc specific macros.

avar commented 7 years ago

@n-soda: Makes sense, b.t.w. that commit is pretty unreadable due to various merging etc. and being originally written against the old version. I took the liberty of squashing a version of that and pushing it to my repo: https://github.com/avar/sha1collisiondetection/commit/fd166aacc1fd4c6af2fa745e10f2cf877d0992b8

That makes it easy to see what the net change is.

This change as it stands will break Solaris. The reason for this is that on bendian systems they just have:

 #define _BIG_ENDIAN

See http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/sys/isa_defs.h#397

They also define _BYTE_ORDER, thus this will error out in the same way this would:

#define _BYTE_ORDER 123
#define _BIG_ENDIAN
#if defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && _BYTE_ORDER == _BIG_ENDIAN
#endif

But obviously the current thing sucks since it breaks various BSDs. Ways out of this that I can see, in no particular order:

n-soda commented 7 years ago

@avar

I took the liberty of squashing a version of that and pushing it to my repo: avar/sha1collisiondetection@fd166aa

Thanks!

This change as it stands will break Solaris.

Hmm, how about 0b0e12e8 then?

n-soda commented 7 years ago

@avar BTW, the following symbols were available on Solaris at the age of SUN Works Pro SC-3.0, some of them may be still available with newer Solaris toolchain:

avar commented 7 years ago

@n-soda : Looks good, I just pushed an alternate approach of doing this, not saying it's better, just something I was hacking on, what do you think about that way of doing it?

WRT the sun macros: I just have testing others did on SunOS 5.11 11.3 sun4v sparc sun4v. There grepping /usr/include for # *define *_*sun didn't yield anything interesting, but maybe we just missed something.

n-soda commented 7 years ago

@avar

what do you think about that way of doing it?

re: 94af1a61 , I'm not sure whether the _BIG_ENDIAN macro on Solaris/sparc will be defined as null string in the distant future as well or not. I guess checking defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN) is more safer.

about the "sun" macro, grepping under /usr/include cannot find the symbol, because the symbol is defined by the cc command itself instead of the headers.

avar commented 7 years ago

Yeah I think your change is simpler & better. I was just hacking on mine concurrently and thought I'd note it.

I think it makes sense though to move the logic you added to the "We do nothing more here for now" condition. I.e. now we do:

gcc? -> glibc? -> becpu -> !lecpu -> nothing

Your change makes it:

gcc? -> glibc? -> bsd/cygwin _BIG_ENDIAN -> becpu -> !lecpu -> nothing

I think it makes more sense to make it:

gcc? -> glibc? ->  becpu -> !lecpu -> bsd/cygwin _BIG_ENDIAN -> nothing

Any version of Solaris ends up in the "becpu" test since it checks for __sparc, so by putting your _BIG_ENDIAN logic after that you avoid any issues with them doing something weird in the future (or on an older version we've missed).

I also think it makes sense to put this after the CPU whitelist/blacklist, that's probably more reliable than what are likely to be more platform-specific macros.

n-soda commented 7 years ago

@avar the gcc?, glibc? and bsd/newlib? tests are essentially similar tests, so I think it's better to put them together. So, how about this order?

becpu -> !lecpu -> gcc? -> glibc? -> bsd/newlib -> nothing?

avar commented 7 years ago

@n-soda: I think it makes more sense to check gcc & glibc first. E.g. SPARC v9 apparently (I'm told by a guy at Oracle) can be configured to run in LE mode, but sparc will still be true, now that's not a practical issue because with a newer GCC we'll return the right thing.

In general it seems like a better idea to trust the compiler/libc first and then start trying to make sense of CPU values. As the Debian page I linked to shows for ppc64 it'll also be easy to introduce subtle regressions on some archs that are mixed BE/LE with future patches if we use this pattern.

cr-marcstevens commented 7 years ago

I'm very glad that we have so many endian cases covered now. After this has settled and merged, it is probably worthwhile to put it in its own header, so it can also easily be used by other projects. Or if you guys want to maintain such an preprocessor endianness detection header somewhere then we're also happy to include that.

Also, I noticed that the following detects big endian for a situation, but still falls through to the else-case when it could detect little endian:

#if defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN) &&  \
    _BYTE_ORDER == _BIG_ENDIAN
#define SHA1DC_BIGENDIAN
#else

Should the final check be inside the if-block, or is the above the intended logic?:

#if defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
#if _BYTE_ORDER == _BIG_ENDIAN
#define SHA1DC_BIGENDIAN
#endif
#else
n-soda commented 7 years ago

@cr-marcstevens

Should the final check be inside the if-block, or is the above the intended logic?:

Yeah, that's better.

n-soda commented 7 years ago

@avar

think it makes more sense to check gcc & glibc first. E.g. SPARC v9 apparently (I'm told by a guy at Oracle) can be configured to run in LE mode, but sparc will still be true, now that's not a practical issue because with a newer GCC we'll return the right thing.

If this is the issue, the *BSD/newlib check has to be placed before the becpu check and the !lecpu check as well. Otherwise a theoretical little-endian sparc port of OpenBSD won't work.

If the defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN) check doesn't work with SPARC/solaris, we have no choice, but if this works, we can safely place the *BSD/newlib check before the becpu check and the !lecpu check.

avar commented 7 years ago

On Wed, Jun 28 2017, n-soda jotted:

@avar

think it makes more sense to check gcc & glibc first. E.g. SPARC v9 apparently (I'm told by a guy at Oracle) can be configured to run in LE mode, but sparc will still be true, now that's not a practical issue because with a newer GCC we'll return the right thing.

If this is the issue, the *BSD/newlib check has to be placed before the becpu check and the !lecpu check as well. Otherwise a theoretical little-endian sparc port of OpenBSD won't work.

Indeed, I think it makes sense to do that, i.e.:

gcc? -> glibc? -> bsd/cygwin _BIG_ENDIAN -> becpu -> !lecpu -> nothing

Provided that the bsd/cygwin check won't falsely fire on Solaris due to checking _LITTLE_ENDIAN.

If the defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN) check doesn't work with SPARC/solaris, we have no choice, but if this works, we can safely place the *BSD/newlib check before the becpu check and the !lecpu check.

From what I've seen of Solaris the check you suggested for _LITTLE_ENDIAN should definitely work, and seems liket he simplest thing to do.

avar commented 7 years ago

@n-soda : Just so we don't duplicate work / nobody does anything, I understand from the discussion above that you'll work on some patch for this, if you're not willing to do that / don't have time I can try to put something together per the discussion above.

shumow commented 7 years ago

Thank you guys for your attention to this matter. Marc and I are heads down on a camera-ready version of a paper (about this algorithm actually (-: ). When you guys settle on a final fix @avar pls test as you did for the last patch and @n-soda please confirm this fixes the platforms/compilers you were looking at, I'll happily merge.

n-soda commented 7 years ago

@cr-marcstevens I applied your comment in cfd84f8 . BTW, I prefer 37f78d0 to cfd84f8 because the former's nesting level is shallower. How do you think about this?

@avar Is this OK for you?

cr-marcstevens commented 7 years ago

Thanks! BTW I also think that using #elif instead of nested #if really improves readability.

avar commented 7 years ago

@n-soda Looks good to me.

I didn't use #elif because I misremembered that there were portability issues with it, but I was just mixing it up with make. That's indeed a lot more readable!

n-soda commented 7 years ago

@cr-marcstevens thanks. I merged the #elif version in 37f78d0 .

@avar your memory is correct. #elif didn't exist before ANSI-C (C89), but that's more than 20 years ago, and we can safely ignore such old compilers.

avar commented 7 years ago

@n-soda @cr-marcstevens @shumow : As noted above, I think the end result of this merge request is good, but the history was a mess. Let's merge my rebased https://github.com/cr-marcstevens/sha1collisiondetection/pull/37 instead, and close this one.

n-soda commented 7 years ago

@cr-marcstevens @shumow

Let's merge my rebased #37 instead, and close this one.

Yeah, please do so.

@avar Thanks for cleaning up the history.

shumow commented 7 years ago

Closing as this was rebased and cleaned up in the now merged: https://github.com/cr-marcstevens/sha1collisiondetection/pull/37