issues
search
attardi
/
wikiextractor
A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.74k
stars
965
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
how to get mention/anchor by wikiextractor.
#286
lshowway
opened
2 years ago
0
TagRE Causes Loss of Large Portion of Page Text
#285
lorr1
closed
2 years ago
0
File output does not match stdout output in v3.0.6
#284
adrianeboyd
opened
2 years ago
0
Unwanted pdb tracing
#283
shangw-nvidia
opened
2 years ago
3
Question: Cirrus Extractor vs. "normal" Extractor - who creates cleaner texts?
#282
PhilipMay
opened
2 years ago
1
Codec encoding errors in OutputSplitter
#281
cBog
opened
2 years ago
0
KeyError for producing HTML output with `--html`
#280
cyk1337
opened
2 years ago
1
Extraction of french wikipedia has lacks and syntaxic faults
#279
Matthieu-Tinycoaching
opened
2 years ago
1
Found a possible security concern
#278
zidingz
opened
2 years ago
0
文章摘要抽取不到数据
#277
chen-better-and-better
opened
2 years ago
4
add encoding parameter in load_templates
#276
gcaillaut
opened
2 years ago
1
templates are not extracted correctly
#275
vrnmthr
opened
2 years ago
0
--json flag is unrecognised
#274
odebroqueville
opened
2 years ago
3
Status of release 3.0.4 / 3.0.5?
#273
PA1212
opened
2 years ago
0
Fixed fileWrapper bug
#272
Rotendahl
closed
2 years ago
0
cannot serialize/pickle '_io.TextIOWrapper' object
#271
kwon0408
opened
3 years ago
5
Would be nice to continue after interrupt
#270
timbmg
opened
3 years ago
0
Template not correctly expanded
#269
dnk8n
opened
3 years ago
2
[Feature Request]: Capture Paragaph Heading information
#268
dnk8n
opened
3 years ago
1
"revid" is incorrectly the page.revision.contributor.id, when it should be the page.revision.id
#267
dnk8n
opened
3 years ago
2
'{{snd}}' should resolve to '–' or '-'
#266
dnk8n
opened
3 years ago
0
Specify python 3.6 version to be the required version in the README
#265
jmorenobl
opened
3 years ago
2
How to preserve sections?
#264
Activeyixiao
opened
3 years ago
1
Fix "EOFError: Ran out of input" in Windows
#263
dreamingjudith
opened
3 years ago
1
OSError: [WinError 87] 参数错误。
#262
fengyunzaidushi
opened
3 years ago
0
Corrected the logic to avoid redirect pages.
#261
Kapilhk
opened
3 years ago
1
How to get only text summary of entity ?
#260
namlh16
closed
3 years ago
0
Fix extractPage version
#259
jmorenobl
opened
3 years ago
1
Execution of "extractPage -v" returns this error => NameError: name 'version' is not defined
#258
jmorenobl
opened
3 years ago
1
Extract the articles that include a colon in the title
#257
ujiuji1259
opened
3 years ago
0
fix missing articles in wikiextractor
#256
ujiuji1259
closed
3 years ago
0
section arg not regonized
#255
tgalery
opened
3 years ago
1
Missing the articles that include a colon in the title
#254
ujiuji1259
opened
3 years ago
1
Extremely Slow on Large Files
#253
IlterOnatKorkmaz
opened
3 years ago
0
Problems with the import extract
#252
FeixLiu
opened
3 years ago
2
Working of Wikiextractor
#251
QazQazaq
opened
3 years ago
1
module error correction
#250
Joonkkyo
opened
3 years ago
0
divided up the text into summary, and contnt for NLP processing
#249
ertosns
opened
3 years ago
3
the function dropNested
#248
jcyk
opened
3 years ago
0
Help! DO dump files contain the wikitable in the wikipedia?
#247
HamLaertes
opened
3 years ago
1
TypeError: cannot pickle '_io.TextIOWrapper' object
#246
wengefan
opened
3 years ago
7
Missing bullets content
#245
shovalsa
opened
3 years ago
1
updated import in WIkiExtractor.py
#244
vakhokoto
closed
3 years ago
2
Wrong import format on python 3
#243
vakhokoto
closed
3 years ago
2
EOFError: Ran out of input
#242
shidaide2019
opened
3 years ago
6
Add minor fix to OutputSplitter.write()
#241
JonasTriki
closed
3 years ago
0
TypeError: __init__() missing 2 required positional arguments: 'title' and 'page'
#240
WarrenMihail
opened
3 years ago
0
TypeError: a bytes-like object is required, not 'str'
#239
JonasTriki
closed
3 years ago
1
Open output files after forking
#238
prokotg
opened
3 years ago
2
Guidance on extracting titles and URLs
#237
onassar
closed
3 years ago
2
Previous
Next