common-voice / cv-sentence-extractor

Scraping Wikipedia for fair use sentences
52 stars 52 forks source link

Fix: stem_separator_regex does not work on real data #194

Closed HarikalarKutusu closed 1 year ago

HarikalarKutusu commented 1 year ago

And it is because, in the real world, the rules.disallowed_words contains lowercase words. The tests were working without any conversion. This fix handles both the main code and the tests by making them lowercase.