0xVavaldi / gramify

Create n-grams of wordlists based on words, characters, or charsets to use in offline password attacks and data analysis
Apache License 2.0
28 stars 3 forks source link

Improved Recommended Command #4

Closed superevr closed 1 year ago

superevr commented 1 year ago

https://github.com/0xVavaldi/gramify/blob/64ef8cfb9edb86419051a4b4b819ed0fa079f107/gramify.py#L792-L794

I suggest updating this recommendation so that it is more likely to work as expected. cut -c9- cuts at the 9th character position. But the size of each row can vary; uniq -c adds padding before the number and the number may be large depending on how large the gramify output is.

% echo "password" | uniq -c | sort -rn | awk '($1 >=1)' | cut -c9-
sword

A simple way to do this in awk would be to use {print $2}, but that would remove any ngrams with a space.

So the solution I found is to use a bit more complex awk command:

awk '{if ($1 >=1) {$1=""; print substr($0, index($0, $2))}}'

You can try it out with different variations with this test command:

% printf -- 'pass word\n%.0s' {1..99999} | uniq -c | sort -rn | awk '{if ($1 >=1) {$1=""; print substr($0, index($0, $2))}}'
pass word
superevr commented 1 year ago

It looks like my problem was actually caused by using the version of the uniq command in MacOS. Gnu uniq uses a more standard pattern of padding.

So as a workaround, make sure to use gnu uniq from coreutils.

0xVavaldi commented 1 year ago

https://github.com/0xVavaldi/gramify/commit/99daaf10647ff4c541be5743b2d45e6985b0de01