davidmoten / big-sorter

Java library that sorts very large files of records by splitting into smaller sorted files and merging
Apache License 2.0
82 stars 18 forks source link

Feature request: combining of wordlists (4+) #6

Open deathmorlock opened 3 years ago

deathmorlock commented 3 years ago

Hello again! May be it will be possible to make on base of BigSorter... I can't find any exist solution in web for 4 and more wordlists

for example i have 4 or more wordlists 1) cat,dog,horse etc 2) red, black, white etc 3) 1971, 2020 , 2001 etc 4) razor, fire, blade etc

I need to make all possible combinations , In combo shoud use only 1 word from each wordlist, like: 1, 2, 3, 4, 5 (example: cat, dog ,2001,fire.) 1+2, 1+3, 1+4, 2+1, 2+3 etc. (example: catdog,catred,dograzor,2020blade) 1+2+3, 1+2+4, 1+3+4, 2+3+4, 2+4+3 etc. (example: dogblackrazor, 1971firehorse..) 1+2+3+4, 2+1+3+4, 1+3+4+2 etc. (example: whiteblade2020cat, 2001horseblackfire...)

davidmoten commented 3 years ago

some sort of password cracker eh? If you have N items in each word list then there are N^4 possibilities. That can be a lot eh. How big is your N? How many combinations can you process per second (assuming you have no delay in generating the combinations)?

deathmorlock commented 3 years ago

Yep, and bigsorter is the fastest tool in the world to work with huge wordlists! For this task exist only one solution: https://github.com/hashcat/princeprocessor
BUT this algo combine words without control of repeats so i have 1+1+1+1+1, 2+2+2+1+3 like that I have 100(25 per wordlist) lines total to combine. With PRINCE algo i get around 500gb wordlist. And it has too many useless combos. I process 1 mln per second

davidmoten commented 3 years ago

Just guessing but I assume you want permutations not just combinations?

For example you want catred as well as redcat?

Just look up code for generating permutations or combinations in java and generate the combinations on-the-fly. You can use recursion or there are non-recursive methods as well. There's no need to save the entire list and then use it.

When you are worried about the total number of permutations being very large then just use a streaming generator of permutations.

You can write this generator yourself by imagining that you are counting in base 25 and the index you are choosing from each list is like a digit in a decimal number. You could also call it base 26 with the extra number signifying that the item from a list is skipped.The result looks like this;

0,0,0,0
0,0,0,1
...
0,0.0.25
0,0,1,0
0,0,1,1
...

Layered on top of that you could use a permutations generator for a list of 4 elements to move the order around. See https://davidmoten.github.io/kool/apidocs/org/davidmoten/kool/Stream.html#permutations-int- for that.

davidmoten commented 3 years ago

Here's standalone code that does what I'm talking about:

https://github.com/davidmoten/big-sorter/blob/master/src/test/java/com/github/davidmoten/bigsorter/Permutations.java.

deathmorlock commented 3 years ago

Yep i want permutations 👍
Thanks very much David, i will try to figure with your links