issues
search
attardi
/
wikiextractor
A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.76k
stars
968
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix bugs when it upgrades to python 3.13(include Python2 -> Python3 a…
#337
Kevin-O-Hsu
closed
3 weeks ago
0
error with" global flags not at the start of the expression at position 4" help~~~
#336
JoeyHuhuu
opened
2 months ago
5
Is this project abandoned?
#335
johann-petrak
opened
4 months ago
0
OSS-Fuzz Integration
#334
ennamarie19
opened
5 months ago
0
bug fix in OutputSplitter regarding file handling for bz2 type
#333
DurgaiVS
opened
6 months ago
0
Get all revisions content
#332
abrahami
opened
6 months ago
0
ipynb file to extract wiki articles generated in google colab
#331
DreamRunnerMoshi
opened
6 months ago
0
pypi not updated with latest version (3.0.7)
#330
JordanHanley
opened
7 months ago
0
ValueError: cannot find context for 'fork' & cannot pickle '_io.TextIOWrapper' object
#329
Harry1035
opened
8 months ago
2
How to store a document in a separate txt file instead of a single txt file containing multiple documents
#328
hxy-62
opened
8 months ago
1
Better formatting in text mode
#327
ProtD
opened
10 months ago
0
fix reference
#326
kato8966
opened
11 months ago
0
Wikidata Extraction
#325
vishwa27yvs
opened
12 months ago
0
Parsing seems to exclude some part of the page
#324
franluca
opened
1 year ago
0
does not extract all wiki
#323
Aeon-Transformer
opened
1 year ago
0
docs: change the tagRE and docs case key="10", key="828"
#322
pphuc25
closed
1 year ago
0
Bullet points are missing in the final extracted text
#321
miguelwon
opened
1 year ago
0
[Request for Help] Should I support a template file like `templates.txt` followed the arg `--templates`?
#320
jacklanda
opened
1 year ago
0
fixing the re.error: global flags not at the start of the expression
#319
miromannino
closed
1 year ago
1
Updating clean_markup function to be compatible with Extractor.__init…
#318
miromannino
opened
1 year ago
0
Add feature to extractPage to also dump the extracted page to json/csv/txt
#317
BwandoWando
opened
1 year ago
0
Add options for a bare text format & removing empty documents
#316
AngledLuffa
opened
1 year ago
0
Patch support for Windows
#315
rgryta
opened
1 year ago
1
Template errors in article
#314
etoilestar
opened
1 year ago
2
Make the regex python 3.11 compatible
#313
santhoshtr
opened
1 year ago
5
Is Windows 10 supported?
#312
nissansz
closed
1 year ago
28
Is Windows supported
#311
nissansz
closed
1 year ago
0
Warning: Template Errors
#310
fzweclipse
opened
1 year ago
1
Never finishes and even debug gets stuck in a loop
#309
number435398
opened
1 year ago
0
Why was --keep_tables removed?
#308
micimize
opened
1 year ago
0
Add argument to preserve unicode characters in json output.
#307
wayneworkman
opened
1 year ago
1
wikiextractor 3.0.6 not extracting
#306
wayneworkman
closed
1 year ago
3
ptwiki-latest error
#305
iwmo
opened
1 year ago
2
Issues on newer (2023) and older (2019) dumps
#304
JohnTailor
closed
1 year ago
0
Option to remove blank pages?
#303
AngledLuffa
opened
1 year ago
1
How to extract lists pages?
#302
katzurik
opened
1 year ago
0
Non-textual elements score and mapframe are not filtered out
#301
adno
opened
1 year ago
0
Various tags such as q, br, ins, del are not fitered out
#300
adno
opened
1 year ago
1
Cannot turn off --html-safe command line option (true by default)
#299
adno
opened
1 year ago
0
Tables are not entirely filtered out
#298
adno
opened
1 year ago
0
remove 1 redundant line in wikiextractor/extractPage.py, although it doesn't affect the function overall
#297
Kelvinthedrugger
opened
2 years ago
0
Dev
#296
tuxiaohui001
opened
2 years ago
0
KeyError in 'page.append(listItem[n] % line)'
#295
audreycs
opened
2 years ago
0
FIX issue 283
#294
hndgzkn
opened
2 years ago
0
Option to drop section titles/headers
#293
Matthieu-Tinycoaching
opened
2 years ago
1
fails on the first file
#292
vsraptor
opened
2 years ago
2
ModuleNotFoundError: No module named '__main__.extract'; '__main__' is not a package
#291
KangChou
opened
2 years ago
0
about "raise BdbQuit" problem
#290
zhenjia2017
opened
2 years ago
10
error_replacement
#289
Woojin718
opened
2 years ago
0
Warning: Template Errors
#288
maulidaannisa
closed
2 years ago
5
Next