common-voice / cv-sentence-extractor

Scraping Wikipedia for fair use sentences
52 stars 52 forks source link

Improvement for best practices for EN rules file (fixes #156) #159

Closed MichaelKohler closed 3 years ago

MichaelKohler commented 3 years ago

This uses best practices learned over the past 2 years or so. As English often gets copied, we for example want to use allowed_symbols.

See #156 for the discussions around this.

MichaelKohler commented 3 years ago

Closing this, seems that a rebase broke the sample extraction.

MichaelKohler commented 3 years ago

New PR: https://github.com/common-voice/cv-sentence-extractor/pull/162