Closed CameronLonsdale closed 7 years ago
There is an issue with chi_squared where by you can be penalised easily for silly mistakes. For instance trying to score the text "abc" against english.unigrams wont work because the unigrams are capital letters. I need to find a way that users arent penalised for this, currently they are. So RIP.
Similar issue to this is the zero division error in chi squared if source_len is 0. Which can happen when you source freq map has no similar characters to the target. In which case, an Error should probably be thrown.
LanguageFrequency and LanguageNgrams is the same thing, why do I need 2. Need to refactor into the same bit of code but the nicest name that its understandible how It can be used.
need to think about the range of values that can result from a fitness function, specifically in relation to the corpus problem