common-voice / cv-sentence-extractor

Scraping Wikipedia for fair use sentences
52 stars 52 forks source link

PLS. DISREGARD THIS Fix: stem_separator_regex does not work on real data #193

Closed HarikalarKutusu closed 1 year ago

HarikalarKutusu commented 1 year ago

PLEASE DISREGARD THIS - ACCIDENTALLY BRANCHED FROM TR BRANCH.

And it is because, in the real world, the rules.disallowed_words are lowercase. The tests were working without any conversion. This fix handles both the main code and the tests by making them lowercase.