Multi-threaded run changes line order

Cynosureprime / rling

RLI Next Gen (Rling), a faster multi-threaded, feature rich alternative to rli found in hashcat utilities.

MIT License

81 stars 11 forks source link

Input file generation:

./john -w=all.lst -ru -stdo > all.lst-rules-with-dupes

all.lst is from https://download.openwall.net/pub/wordlists/all.gz (MD5 f7b3b76d15bbb95fcb267ea6be108cce), john is current bleeding-jumbo with its default john.conf. The resulting all.lst-rules-with-dupes is 173188126 lines, 2037345891 bytes (MD5 4c221f4df353aae89bdcd6888e92887a).

These commands produce the same unique lines, but in different order:

./rling -t 1 all.lst-rules-with-dupes /dev/shm/t1
./rling -t 2 all.lst-rules-with-dupes /dev/shm/t2

$ md5sum /dev/shm/t?
59b8b432957640387ba2b83d2583c792  /dev/shm/t1
625f25208a5ea41f4fb03fc51626c68b  /dev/shm/t2
$ wc -l /dev/shm/t?
 164074000 /dev/shm/t1
 164074000 /dev/shm/t2

t1 is the same as what JtR's unique program produces, t2 isn't.

Edit: more detail: t2 changes between command invocations. This is on Scientific Linux 6.10 (so old glibc, and I had to add -lrt for clock_gettime to be found). I tried with two gcc versions (system detault gcc 4.4.7 and devtoolset-8 gcc 8.2.1) - same behavior.

I have identified and replicated the issue. The core of the problem is that rling splits the file into large "chunks", and processes these on multiple cores at the same time. For example, in your test file, the word "svn7" appears at line 62541, 43312836, 71731224, 71733302 and 71749022. Depending on the number of cores (threads) in use, the later uses of the word "svn7" may be processed prior to the "earlier" line numbers. There, of course, is no issue with the file actually being re-ordered, just that any duplicates may be dropped, not necessarily the later ones in the file. I was able to see this behaviour on several different systems, and in all cases the correct number of lines were output - all without duplication.

All of that said, the implication that "first in file wins" is the principle of least astonishment, and there will be a change to the code to implement this (though I may offer a switch, as it is significantly faster to process the file as cores become available, rather than waiting for a previous block to complete prior to starting the next run.

Cynosureprime / rling

Multi-threaded run changes line order #16