Open shreyanid opened 9 months ago
We expect this functionality to change significantly with the introduction of langdetect for document language detection
We can close this now since contains_english_word
is no longer used. Open #3007 to remove the unused code path.
Describe the bug The function
contains_english_word(text)
intext-type.py
checks the input text against a list of English words to determine if the text contains an English word. However, English words that are also present in other languages (ex. "no" in Spanish) are also getting matched by this function, so checks likein
is_possible_narrative_text
are failing when they should be entering this case.To Reproduce Example:
Expected behavior Only English words in written in English text should match this function, not the presence of any English word (even in other languages when the words are unrelated).