common-voice / cv-sentence-extractor

Scraping Wikipedia for fair use sentences
52 stars 52 forks source link

Replacements should be done before segmentation #161

Closed MichaelKohler closed 2 years ago

MichaelKohler commented 2 years ago

For some reason I thought this was already done. Turns out it wasn't and we really should do the replacements before we split up into sentences to get the most out of the broken rust-punkt segmentation.