berzerk0 / Probable-Wordlists

Version 2 is live! Wordlists sorted by probability originally created for password generation and testing - make sure your passwords aren't popular!
Creative Commons Attribution Share Alike 4.0 International
8.71k stars 1.61k forks source link

Wordlists don't contain Non-ASCII Characters #9

Closed berzerk0 closed 6 years ago

berzerk0 commented 7 years ago

Americans aren't the only ones with passwords - why not have special wordlists that include non-ASCII Characters?

I'm glad you asked.

As my knowledge level increases so does my ability to sort out lines. I have two methodologies that I will put to use for Rev 2.0

1. Grep out passwords containing characters from different alphabets

If there is an alphabet published in unicode on Wikipedia, I plan to grep for it

2. Make Sub-set lists based on source name.

In actuality, I'm awful at darts.

I welcome any suggestions - except on my darts game. I mean suggestions about the wordlists.

iancnorden commented 7 years ago

Hey again,

Not sure if this has had much thought or updates, but I believe unicode.com upholds the 'official' characters lists that can be rendered or utilized from other alphabets... such as punicode to unicode. Good example: https://unicode-table.com/en/#cyrillic

I believe these are sourced from: https://github.com/unicode-table/unicode-table-data which may have good data on a per-language or per character set to base an initial push from.

berzerk0 commented 7 years ago

Great find! I still plan on implementing this.

As a status update on this and Rev 2 generally, I have found plenty of sources and need to do a bit of sifting before repeating the process. I'd say Mid-July is a generous estimate for Rev 2 - meaning it may be sooner than that.

berzerk0 commented 6 years ago

"Mid July" haha.

The lists now contain non-ascii characters.