CameronLonsdale / lantern

Cryptanalysis library for breaking classical ciphers
MIT License
26 stars 4 forks source link

Things to think about because im disorganised #10

Closed CameronLonsdale closed 7 years ago

CameronLonsdale commented 7 years ago

LanguageFrequency and LanguageNgrams is the same thing, why do I need 2. Need to refactor into the same bit of code but the nicest name that its understandible how It can be used.

need to think about the range of values that can result from a fitness function, specifically in relation to the corpus problem

CameronLonsdale commented 7 years ago

There is an issue with chi_squared where by you can be penalised easily for silly mistakes. For instance trying to score the text "abc" against english.unigrams wont work because the unigrams are capital letters. I need to find a way that users arent penalised for this, currently they are. So RIP.

Similar issue to this is the zero division error in chi squared if source_len is 0. Which can happen when you source freq map has no similar characters to the target. In which case, an Error should probably be thrown.