NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.3k stars 13.54k forks source link

john: Missing AVX2 support #328226

Open jtojnar opened 1 month ago

jtojnar commented 1 month ago

It was brought to my attention we do not compile john with AVX2 support, which might have significant negative performance effects.

Additional context

Apparently, john uses auto-detection, which we disabled for compatibility and reproducibility.

Arch builds the project multiple times with different levels of CPU features so that user can choose between performance and compatibility: https://gitlab.archlinux.org/archlinux/packaging/packages/john/-/blob/66e5e2c28aaaed440029c73190802cd26e7440ad/PKGBUILD#L75-80

Notify maintainers

cc @offlinehacker @matthewbauer @CherryKitten @CyberShadow


Add a :+1: reaction to issues you find important.

solardiz commented 1 month ago

Arch builds the project multiple times with different levels of CPU features

Yes, but upon a closer look at the commit you referenced, apparently they only build up to AVX/XOP at best, not including AVX2 and AVX512BW. Something they could want to fix. cc: @anthraxx

so that user can choose between performance and compatibility

No, so that our automatic CPU fallback feature works (progressively invoking a less capable binary until the CPU requirement check passes, which is transparent to the user).

CyberShadow commented 1 month ago

Hi!

our automatic CPU fallback feature

Who is "our" and where is that feature, please? Can we (nixpkgs) use it too?

It was brought to my attention we do not compile john with AVX2 support, which might have significant negative performance effects.

How does nixpkgs usually deal with this? Can we expose it via something like pkgs.pkgs-x86-64-v4-Linux or (import <nixpkgs> { system = "x86_64-v4-linux" }).john?

solardiz commented 1 month ago

our automatic CPU fallback feature

Who is "our" and where is that feature, please? Can we (nixpkgs) use it too?

Sorry I was unclear. By "our" I meant John the Ripper upstream project. The feature is in the upstream source tree and intended for use by distros. Unfortunately, it is currently tricky to use as we do not provide a build script of our own that would perform the multiple builds and renames. Other than that, sure, you (nixpkgs) are welcome to use it, like Arch does (but complete the fallback chain to include AVX2 and start with non-fallback AVX512BW main binary, please).

solardiz commented 1 month ago

Some more examples are referenced from https://github.com/openwall/john/issues/5233 and some are in https://github.com/search?q=repo%3Aopenwall%2Fjohn-packages%20CPU_FALLBACK&type=code

One aspect we're inconsistent on (between different packaged builds that use this feature) is whether to call the fallback binaries "positively" (e.g., john-avx) or "negatively" (e.g., john-non-avx). The hard-coded names are currently negative, because the invoking binary only knows what feature the CPU was found lacking (which is why it triggers the fallback) - it doesn't reliably know what exactly the next fallback binary requires. However, as you can see some specific build scripts override that to use positive naming (since those scripts do know exactly what they build). Such overriding of the names is optional, you don't have to.

CyberShadow commented 1 month ago

Thanks! I also found README-DISTROS with some packaging advice.

solardiz commented 1 month ago

I also found README-DISTROS with some packaging advice.

Right. One thing to note from there is the --enable-simd= option. It's preferable over manually enabling SIMD extensions in CFLAGS (which Arch does), in part because those extensions must not be enabled for a handful of object files with code running prior to CPU test and fallback occurring (we take care of that).

CyberShadow commented 1 month ago

Does nixpkgs need fallbacks on the OMP / non-OMP axis as well?

solardiz commented 1 month ago

Does nixpkgs need fallbacks on the OMP / non-OMP axis as well?

This is optional, but nice to have. Non-OpenMP builds are usually very slightly faster in the special case of running 1 thread, so that's when they're used. Running 1 thread per process can happen in a VM or when using --fork instead of OpenMP. When OpenMP fallback is not enabled yet the thread count is 1, a warning is printed instead:

Warning: OpenMP is disabled; a non-OpenMP build may be faster
CyberShadow commented 1 month ago

OK, thanks! Here is what I have so far:

diff --git a/pkgs/tools/security/john/default.nix b/pkgs/tools/security/john/default.nix
index aeefcaa0bbef..6834594c79d0 100644
--- a/pkgs/tools/security/john/default.nix
+++ b/pkgs/tools/security/john/default.nix
@@ -19,8 +19,34 @@
   ocl-icd,
   substituteAll,
   makeWrapper,
-}:
+  simdChain ?
+    if stdenv.buildPlatform.isx86 then
+      [
+        "avx512bw"
+        "avx512f"
+        "avx2"
+        "xop"
+        "avx"
+        "sse2"
+      ]
+    else
+      [ ],
+  withOpenMP ? true,
+  callPackage,
+}@args:

+# john has a "fallback chain" mechanism; whenever the john binary
+# encounters that it is built for a SIMD target that is not supported
+# by the current CPU, it can fall back to another binary that is not
+# built to expect that feature, continuing until it eventually reaches
+# a compatible binary. See:
+# https://github.com/openwall/john/blob/bleeding-jumbo/src/packaging/build.sh
+# https://github.com/openwall/john/blob/bleeding-jumbo/doc/README-DISTROS
+# https://github.com/NixOS/nixpkgs/issues/328226
+let
+  simdFallback = (callPackage ./default.nix (args // { simdChain = lib.tail simdChain; })).john;
+  ompFallback = (callPackage ./default.nix (args // { withOpenMP = false; })).john;
+in
 stdenv.mkDerivation rec {
   pname = "john";
   version = "rolling-2404";
@@ -61,10 +87,30 @@ stdenv.mkDerivation rec {
     + lib.optionalString withOpenCL ''
       python ./opencl_generate_dynamic_loader.py  # Update opencl_dynamic_loader.c
     '';
-  configureFlags = [
-    "--disable-native-tests"
-    "--with-systemwide"
-  ];
+  __structuredAttrs = true;
+  configureFlags =
+    [
+      "--disable-native-tests"
+      "--with-systemwide"
+    ]
+    ++ (lib.optionals (simdChain != [ ]) [
+      "--enable-simd=${lib.head simdChain}"
+      "CPPFLAGS=${
+        builtins.concatStringsSep " " [
+          "-DCPU_FALLBACK"
+          "-DCPU_FALLBACK_BINARY=${lib.escapeShellArg "\"${simdFallback}/bin/john\""}"
+        ]
+      }"
+    ])
+    ++ (lib.optionals (!withOpenMP) [ "--disable-openmp" ])
+    ++ (lib.optionals withOpenMP [
+      "CPPFLAGS=${
+        builtins.concatStringsSep " " [
+          "-DOMP_FALLBACK"
+          "-DOMP_FALLBACK_BINARY=${lib.escapeShellArg "\"${ompFallback}/bin/john\""}"
+        ]
+      }"
+    ]);

   buildInputs =
     [
@@ -106,9 +152,10 @@ stdenv.mkDerivation rec {
     ]);
   # TODO: Get dependencies for radius2john.pl and lion2john-alt.pl

-  # gcc -DAC_BUILT -Wall vncpcap2john.o memdbg.o -g    -lpcap -fopenmp -o ../run/vncpcap2john
-  # gcc: error: memdbg.o: No such file or directory
-  enableParallelBuilding = false;
+  outputs = [
+    "out" # full package
+    "john" # just the binary - for the SIMD fallback chain
+  ];

   postInstall = ''
     mkdir -p "$out/bin" "$out/etc/john" "$out/share/john" "$out/share/doc/john" "$out/share/john/rules" "$out/${perlPackages.perl.libPrefix}"
@@ -119,6 +166,9 @@ stdenv.mkDerivation rec {
     cp -vt "$out/share/john/rules" ../run/rules/*.rule
     cp -vrt "$out/share/doc/john" ../doc/*
     cp -vt "$out/${perlPackages.perl.libPrefix}" ../run/lib/*
+
+    mkdir -p "$john/bin"
+    cp -vt "$john/bin" "$out/bin/john"
   '';

   postFixup = ''

I'll find out if it works tomorrow, because this does look like it'll take a while to build...

On that note... this will require building John 14 times (7 x 2). There isn't really a short-cut around that, is there? Each build takes like 10 minutes on my threadripper even if I re-enable parallel building (which looks like it works fine now?). I have no idea if the NixOS org wants to take on all of this compute.

solardiz commented 1 month ago

this will require building John 14 times (7 x 2). There isn't really a short-cut around that, is there?

No shortcut other than deciding not to build some of these. For example, you could omit avx512f, which is basically only needed for Xeon Phi 2nd gen - I guess anyone who still has those (or newly acquired as they're dirt cheap now) would do their own build. On typical CPUs, it's sufficient to have avx512bw with fallback to avx2. OTOH, you could maybe add sse4.1 and ssse3.

Each build takes like 10 minutes on my threadripper even if I re-enable parallel building (which looks like it works fine now?).

That's weird. It takes under 1 minute on a quad-core laptop for me. Of course, parallel building. I'm not aware of it ever having been broken - we've been using it all the time.

CyberShadow commented 1 month ago

Thanks. Quick question, is there anything in the package that invokes the john binary other than john itself for the fallback mechanism?

It was nice and easy to make the nixpkgs package recurse into itself to build the fallback chain, though that has the slight downside that we're also packaging all the other files for each link in the chain. That shouldn't be too much of an issue in practice with store optimization. We could improve it by splitting the two (into subpackages or multiple outputs), however if we were to do that, we need to be careful so that 1) all john binaries look in the correct place for things like dictionaries, and 2) any other programs or scripts that invoke john should know where to look for it.

That's weird. It takes under 1 minute on a quad-core laptop for me. Of course, parallel building.

Ah, thanks for that - it was my bad, looks like I need to explicitly enable parallel building for autoconf in nixpkgs.

I'm not aware of it ever having been broken - we've been using it all the time.

Here's the commit that added it, has a link to a log of a failed build - that was a while ago though: https://github.com/nixos/nixpkgs/commit/e7ce27f9ce47bd87f8580f27583f98d29be9e4bf

solardiz commented 1 month ago

is there anything in the package that invokes the john binary other than john itself for the fallback mechanism?

We have some built-in programs that are part of john and are accessed via symlinks to it - e.g., unshadow. Those use the fallback mechanism, too: https://github.com/openwall/john/blob/367d6438e6bd5cfd20f3290aac479ab4f1e5fea2/src/john.c#L2040

It's not that SIMD matters much for their performance (they do not perform password hashing), but rather that they use shared object files in john and thus could end up running occasional compiler-generated SIMD instructions beyond the current CPU's capabilities if we did not use the fallbacks for them.

Normally, you would not need to handle these specially - they would just invoke a john-* fallback binary while passing their original argv to it (including argv[0]), so it'd look equivalent to being invoked via a symlink to the fallback binary. But please test.

be careful so that 1) all john binaries look in the correct place for things like dictionaries, and 2) any other programs or scripts that invoke john should know where to look for it.

I don't see why you'd be running into such problems. Normally, it's just one package that has multiple john-* binaries and one start-of-chain john binary. All of them know the same "correct place for things like dictionaries", just because they were built with the same settings (other than --enable-simd differences), and "other programs or scripts that invoke john" simply invoke the start-of-chain john (or other symlinks such as unshadow), which then performs the necessary fallbacks. But of course you need to test and make sure things work.

link to a log of a failed build - that was a while ago though: https://github.com/NixOS/nixpkgs/commit/e7ce27f9ce47bd87f8580f27583f98d29be9e4bf

Oh, right. I see we did have this bug in 1.8.0-jumbo-1 released in December 2014. We fixed it in git in December 2015. This build log you reference is from 2017, but it's reasonable you were still building our latest release of the time. Not everyone ran into this bug because it'd only be reached when libpcap development package was installed, which was/is an optional dependency - but it's reasonable that in a package you try and include the complete functionality.