berzerk0 / Probable-Wordlists

Version 2 is live! Wordlists sorted by probability originally created for password generation and testing - make sure your passwords aren't popular!
Creative Commons Attribution Share Alike 4.0 International
8.7k stars 1.61k forks source link

Provide password occurences #20

Open mdPlusPlus opened 7 years ago

mdPlusPlus commented 7 years ago

Could you please provide how often the passwords occur?

This way one could build adequately weighted probable password masks for hashcat.

berzerk0 commented 7 years ago

Yes, but for Revision 2. Set to release that by mid-July.

Also, might not be for the biggest files, since the addition of count will add a lot of file size. However, I plan on only including targz and 7z in Rev 2, so there will be more room for different types of files.

I'll put this on the official Revision 2 Task List.

berzerk0 commented 7 years ago

If in the meantime, someone is looking for a jury-rigged way to approximate this, try something like

cat WORDLIST | egrep -n '.+'  | cut -d ':' -f 2- 

to get a ranking with line numbers.

Also know that the files themselves are sized based on popularity. Smallest Files: Lines Appeared at least 75 Times, Then at least 50, 25, 10, 5 and 2 times.

One could theoretically remove lines from say, the 75+ file, out of the 50+ file and have all the passwords that appeared 50-74 times, which could be given a different weight than the 75+.

For example:

Remove the top 196 from the top 3575 to get the 197th-3575th most popular lines

Give the lines in 196 a heavier weight than the lines in 197-3575

Not as accurate as a true appearance count, but it might provide some use.

Real deal looking like Mid/Late July, I'm nearing the end of the human work stage and more into the script work stage.

berzerk0 commented 6 years ago

I'll include masks/rules in Rev 2.

Real deal looking like Mid/Late July, I'm nearing the end of the human work stage and more into the script work stage.

hahahaha

cMadan commented 6 years ago

Are the occurrence/frequency counts stored somewhere? I don't see them in the Real Passwords lists (https://github.com/berzerk0/Probable-Wordlists/tree/master/Real-Passwords), but maybe I'm looking in the wrong place?

berzerk0 commented 6 years ago

I have not included a file where the number of appearances for a line is paired with the line itself.

I have however, in Analysis-Files, included masks and rules that are based on that information.

Did you want to create a mask or some other kind of analysis?

cMadan commented 6 years ago

I am interested in looking at the characteristics of passwords people use, so frequency counts are useful--rather than just rank information. I don't think the rules included in that folder can help with this analysis.

berzerk0 commented 6 years ago

The masks include character type, number and order. ProbWL-mask-probable-v2-counts.txt contains appearance counts for the masks it found.

Are you looking for more depth than that?

cMadan commented 6 years ago

I think I'm looking for something simpler than that, e.g., in Top1575-probable-v2.txt, what are the frequencies for "dragon" and "sunshine".

berzerk0 commented 6 years ago

I suppose I could throw together those for the smaller files, sure.

cMadan commented 6 years ago

Hi! Any plans for implementing this--or suggestions on how I could do it myself? Thanks!

berzerk0 commented 6 years ago

Apologies, this project has taken a bit of a backseat. I’ve got the files that I can make these lists with - just gotta find the time to get them prepped.

This week is unlikely but this weekend perhaps

berzerk0 commented 6 years ago

Uploaded for 1575, 12Thousand and 304Thousand with appearance counts

https://github.com/berzerk0/Probable-Wordlists/tree/master/Analysis-Files

cMadan commented 6 years ago

Great, thank you!!

MauriRios commented 3 years ago

can you make a dic.. with numbers 10.000.000 to 80.000.000? thats the DNI from argentain, i must peopel put that dni number in hers wifi