issues
search
common-voice
/
cv-sentence-extractor
Scraping Wikipedia for fair use sentences
52
stars
52
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Some wikicode ignored in sentences derived from Wikipedia
#154
fabricebg
closed
3 years ago
2
Add Initial rules and blocklist for Swahili
#153
OmondiKevin
closed
3 years ago
0
Add initial rules and blocklist for Ukrainian
#152
somerandomguyontheweb
opened
3 years ago
2
Fix Sample Extract
#151
MichaelKohler
closed
3 years ago
1
Add option to use custom Python tokenizer to split sentences if needed
#150
MichaelKohler
closed
3 years ago
0
Adjust all cargo run commands to use release build
#149
MichaelKohler
closed
3 years ago
0
Add a note on performance
#148
somerandomguyontheweb
closed
3 years ago
0
Technical analysis: re-running Wikipedia exports
#147
MichaelKohler
closed
3 years ago
16
Test only
#146
MichaelKohler
closed
3 years ago
0
Simplify and generalize CI scripts
#145
MichaelKohler
closed
3 years ago
5
Punkt dependency and related issues
#144
dodomorandi
closed
3 years ago
6
Clean incomplete sentences
#143
Mte90
closed
3 years ago
3
Add WikiSource as target
#142
MichaelKohler
closed
3 years ago
4
Changes for italian
#141
Mte90
closed
3 years ago
13
Note on order of rules: replacements runs first
#140
bact
closed
3 years ago
1
extract-file failed: 'attempt to subtract with overflow'
#139
bact
closed
3 years ago
7
Documenting the order of execution for language rules
#138
bact
closed
3 years ago
5
Adding Thai rules for CV Sentence Extractor
#137
bact
opened
3 years ago
8
added Azerbaijan language toml and wiki sample
#136
damirci
closed
3 years ago
3
fixed typo in readme
#135
navalnica
closed
3 years ago
0
Initial Galician rules and blocklist
#134
xosecalvo
closed
3 years ago
10
Add Thai language
#133
wannaphong
closed
3 years ago
1
Copy cs to sk for prototyping
#132
Adrijaned
opened
3 years ago
3
Fix regex rules in hu.toml
#131
xTibor
closed
4 years ago
0
Improve Hungarian export to not allow abbreviations (fixes #129)
#130
MichaelKohler
closed
4 years ago
0
Improve Hungarian Wiki Export
#129
MichaelKohler
closed
4 years ago
3
Fix readme guide at step 2 from wikiextractor.py to something else
#128
Oymate
closed
4 years ago
2
Add flag to change punctuation end from . to something else
#127
Oymate
closed
4 years ago
1
Fix Lithuanian Export
#126
MichaelKohler
closed
4 years ago
5
Fix Extraction for Belarusian (and possibly others)
#125
MichaelKohler
closed
4 years ago
1
Add rules and blockwords for Lithuanian language --full-wiki-extraction=lt
#124
mjurkus
closed
4 years ago
4
Implement black list symbol & load symbols from file
#123
yanganto
closed
4 years ago
0
Update from hk
#122
yanganto
closed
4 years ago
0
Add Hungarian language
#121
djlancelot
closed
4 years ago
5
updata from hk (random select & chopping end)
#120
yanganto
closed
4 years ago
0
Fix instructions in README.md
#119
somerandomguyontheweb
closed
4 years ago
0
Add initial rules and blacklist for Belarusian
#118
somerandomguyontheweb
closed
3 years ago
33
Sentence-level lookbehind rule
#117
somerandomguyontheweb
closed
4 years ago
3
Ignore symbols
#116
yanganto
closed
4 years ago
0
WIP: Add rules for Swedish
#115
andersjohansson
opened
4 years ago
16
Add Manual Trigger to GitHub Actions for triggering full exports and …
#114
MichaelKohler
closed
4 years ago
0
Update dirname following uppdate of repo-name in README.md
#113
andersjohansson
closed
4 years ago
0
Fix typo in README.md
#112
andersjohansson
closed
4 years ago
0
Update repo-url in README.md
#111
andersjohansson
closed
4 years ago
2
Punkt Issue for Indian languages
#110
arijitx
closed
4 years ago
3
fix longest length option and add bondary test
#109
yanganto
closed
4 years ago
0
Create Github Actions to run full export on PR / create blocklist
#108
MichaelKohler
closed
4 years ago
4
created or.toml
#107
psubhashish
opened
4 years ago
2
Update mandarin branch with more features
#106
yanganto
closed
4 years ago
3
add translate option
#105
yanganto
closed
4 years ago
5
Previous
Next