issues
search
common-voice
/
cv-sentence-extractor
Scraping Wikipedia for fair use sentences
52
stars
52
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Adding latvian rules
#204
raivisdejus
opened
10 months ago
0
Will update instructions to use latest wikiextractor
#203
raivisdejus
opened
1 year ago
1
Adding regex replacement feature
#202
raivisdejus
opened
1 year ago
1
Fix command position of arguments and make title-filter-list optional
#201
MichaelKohler
closed
1 year ago
0
Bad behavior in wikiextractor
#200
HarikalarKutusu
opened
1 year ago
3
Add rule to remove content between brackets
#199
HarikalarKutusu
closed
1 year ago
5
Feature Request - Removal of parentheses.
#198
HarikalarKutusu
closed
1 year ago
1
Upgrade clap
#197
MichaelKohler
closed
1 year ago
0
Update python-inline to 0.11.0
#196
MichaelKohler
closed
1 year ago
0
Fix toolchain to nightly-2023-07-28
#195
MichaelKohler
closed
1 year ago
0
Fix: stem_separator_regex does not work on real data
#194
HarikalarKutusu
closed
1 year ago
0
PLS. DISREGARD THIS Fix: stem_separator_regex does not work on real data
#193
HarikalarKutusu
closed
1 year ago
0
Update dependencies except major updates
#192
MichaelKohler
closed
1 year ago
0
Fix warnings
#191
MichaelKohler
closed
1 year ago
0
Update Python and Node.js in pipeline
#190
MichaelKohler
closed
1 year ago
0
Update used actions in pipeline
#189
MichaelKohler
closed
1 year ago
0
Only run clippy once in the pipeline
#188
MichaelKohler
closed
1 year ago
0
Add a new rule - stem_separator_regex
#187
HarikalarKutusu
closed
1 year ago
2
Replacements not considering whitespace
#186
HarikalarKutusu
closed
1 year ago
6
Add Turkish support (2023-08 finalized)
#185
HarikalarKutusu
closed
1 year ago
1
What is the limit for "replacements"?
#184
HarikalarKutusu
closed
3 months ago
2
Add max_characters to rules
#183
HarikalarKutusu
closed
1 year ago
0
max_characters in rules
#182
HarikalarKutusu
closed
1 year ago
3
Is there a limit to the audio duration?
#181
JJun-Guo
opened
1 year ago
22
Add initial rules and blocklist for Sakha language
#180
gaydmi
opened
1 year ago
0
Update de.toml
#179
BrunoFischerGermany
closed
1 year ago
3
Question: How is the result quality?
#178
HarikalarKutusu
closed
2 years ago
1
Add more abbreviations for DE detected when running a re-export
#177
MichaelKohler
closed
2 years ago
2
Bump regex from 1.5.4 to 1.5.5
#176
dependabot[bot]
closed
2 years ago
0
WikiExtractor doesnt extract text for bn, hi
#175
arijitx
closed
2 years ago
1
Unable to filter out single letter and dot using abbreviation_patterns
#174
comodoro
closed
2 years ago
3
Bump node-fetch from 3.0.0 to 3.1.1 in /scripts
#173
dependabot[bot]
closed
2 years ago
0
Shuffle sample extract to not include sentences in article order (fixes #171)
#172
MichaelKohler
closed
2 years ago
0
Sample Extraction Pipeline: only use shuffled file
#171
MichaelKohler
closed
2 years ago
0
Additional DE rules improvements
#170
MichaelKohler
closed
2 years ago
2
Add Bengali bn
#169
arijitx
closed
2 years ago
7
Initial Polish language rules and blocklist
#168
J-Wrobel
opened
3 years ago
3
Add explanation on Wiki dumps and how the input files will look like
#167
guerda
closed
3 years ago
2
Test only: remove abbreviation rules to test rust-punkt behavior
#166
MichaelKohler
closed
3 years ago
0
Use Python NLTK tokenizer for German
#165
guerda
closed
3 years ago
9
Test commit to test actions
#164
MichaelKohler
closed
3 years ago
0
Actions: Rebase seems to break sample extraction
#163
MichaelKohler
closed
3 years ago
1
Improvement for best practices for EN rules file (fixes #156)
#162
MichaelKohler
closed
2 years ago
4
Replacements should be done before segmentation
#161
MichaelKohler
closed
3 years ago
0
Add Initial rules and blocklist for Igbo (ig)
#160
chrisemezue
closed
3 years ago
6
Improvement for best practices for EN rules file (fixes #156)
#159
MichaelKohler
closed
3 years ago
2
Added allowed_symbols_regex, removed disallowed_symbols EO rules
#158
stefangrotz
closed
3 years ago
6
Improve DE rules file
#157
stefangrotz
closed
2 years ago
32
Use best practice in English rules file
#156
MichaelKohler
closed
2 years ago
14
Implement filtering by title
#155
MichaelKohler
closed
3 years ago
0
Next