issues
search
attardi
/
wikiextractor
A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.69k
stars
959
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Is this project abandoned?
#335
johann-petrak
opened
1 week ago
0
OSS-Fuzz Integration
#334
ennamarie19
opened
1 month ago
0
bug fix in OutputSplitter regarding file handling for bz2 type
#333
DurgaiVS
opened
1 month ago
0
Get all revisions content
#332
abrahami
opened
1 month ago
0
ipynb file to extract wiki articles generated in google colab
#331
DreamRunnerMoshi
opened
1 month ago
0
pypi not updated with latest version (3.0.7)
#330
JordanHanley
opened
2 months ago
0
ValueError: cannot find context for 'fork' & cannot pickle '_io.TextIOWrapper' object
#329
Harry1035
opened
3 months ago
2
How to store a document in a separate txt file instead of a single txt file containing multiple documents
#328
hxy-62
opened
3 months ago
0
Better formatting in text mode
#327
ProtD
opened
5 months ago
0
fix reference
#326
kato8966
opened
7 months ago
0
Wikidata Extraction
#325
vishwa27yvs
opened
7 months ago
0
Parsing seems to exclude some part of the page
#324
franluca
opened
7 months ago
0
does not extract all wiki
#323
Aeon-Transformer
opened
9 months ago
0
docs: change the tagRE and docs case key="10", key="828"
#322
pphuc25
closed
9 months ago
0
Bullet points are missing in the final extracted text
#321
miguelwon
opened
10 months ago
0
[Request for Help] Should I support a template file like `templates.txt` followed the arg `--templates`?
#320
jacklanda
opened
10 months ago
0
fixing the re.error: global flags not at the start of the expression
#319
miromannino
closed
10 months ago
1
Updating clean_markup function to be compatible with Extractor.__init…
#318
miromannino
opened
10 months ago
0
Add feature to extractPage to also dump the extracted page to json/csv/txt
#317
BwandoWando
opened
10 months ago
0
Add options for a bare text format & removing empty documents
#316
AngledLuffa
opened
11 months ago
0
Patch support for Windows
#315
rgryta
opened
1 year ago
1
Template errors in article
#314
etoilestar
opened
1 year ago
2
Make the regex python 3.11 compatible
#313
santhoshtr
opened
1 year ago
5
Is Windows 10 supported?
#312
nissansz
closed
1 year ago
28
Is Windows supported
#311
nissansz
closed
1 year ago
0
Warning: Template Errors
#310
fzweclipse
opened
1 year ago
1
Never finishes and even debug gets stuck in a loop
#309
number435398
opened
1 year ago
0
Why was --keep_tables removed?
#308
micimize
opened
1 year ago
0
Add argument to preserve unicode characters in json output.
#307
wayneworkman
opened
1 year ago
1
wikiextractor 3.0.6 not extracting
#306
wayneworkman
closed
1 year ago
3
ptwiki-latest error
#305
iwmo
opened
1 year ago
2
Issues on newer (2023) and older (2019) dumps
#304
JohnTailor
closed
1 year ago
0
Option to remove blank pages?
#303
AngledLuffa
opened
1 year ago
1
How to extract lists pages?
#302
katzurik
opened
1 year ago
0
Non-textual elements score and mapframe are not filtered out
#301
adno
opened
1 year ago
0
Various tags such as q, br, ins, del are not fitered out
#300
adno
opened
1 year ago
1
Cannot turn off --html-safe command line option (true by default)
#299
adno
opened
1 year ago
0
Tables are not entirely filtered out
#298
adno
opened
1 year ago
0
remove 1 redundant line in wikiextractor/extractPage.py, although it doesn't affect the function overall
#297
Kelvinthedrugger
opened
1 year ago
0
Dev
#296
tuxiaohui001
opened
1 year ago
0
KeyError in 'page.append(listItem[n] % line)'
#295
audreycs
opened
1 year ago
0
FIX issue 283
#294
hndgzkn
opened
1 year ago
0
Option to drop section titles/headers
#293
Matthieu-Tinycoaching
opened
1 year ago
1
fails on the first file
#292
vsraptor
opened
1 year ago
2
ModuleNotFoundError: No module named '__main__.extract'; '__main__' is not a package
#291
KangChou
opened
1 year ago
0
about "raise BdbQuit" problem
#290
zhenjia2017
opened
1 year ago
10
error_replacement
#289
Woojin718
opened
2 years ago
0
Warning: Template Errors
#288
maulidaannisa
closed
2 years ago
5
Question ValueError: cannot find context for 'fork'
#287
yaoysyao
closed
1 year ago
4
how to get mention/anchor by wikiextractor.
#286
lshowway
opened
2 years ago
0
Next