attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps
GNU Affero General Public License v3.0
3.74k stars 965 forks source link

ptwiki-latest error #305

Open iwmo opened 1 year ago

iwmo commented 1 year ago

while trying to extract ptwiki-latest-pages-articles.xml.bz2 im getting following error: python -m wikiextractor.WikiExtractor ptwiki-latest-pages-articles.xml.bz2 Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/home/oriebirj/Desktop/lufz/wikiextractor/WikiExtractor.py", line 66, in from .extract import Extractor, ignoreTag, define_template, acceptedNamespaces File "/home/oriebirj/Desktop/lufz/wikiextractor/extract.py", line 382, in ExtLinkBracketedRegex = re.compile( ^^^^^^^^^^^ File "/usr/lib/python3.11/re/init.py", line 227, in compile return _compile(pattern, flags) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/init.py", line 294, in _compile p = _compiler.compile(pattern, flags) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_compiler.py", line 743, in compile p = _parser.parse(p, flags) ^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_parser.py", line 980, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_parser.py", line 455, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_parser.py", line 863, in _parse p = _parse_sub(source, state, sub_verbose, nested + 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_parser.py", line 455, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_parser.py", line 863, in _parse p = _parse_sub(source, state, sub_verbose, nested + 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_parser.py", line 455, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/re/_parser.py", line 841, in _parse raise source.error('global flags not at the start ' re.error: global flags not at the start of the expression at position 4

Not sure why this happens. Any clue??

Thanks

kevinwallace commented 1 year ago

I ran into this today on Python 3.11.2, and applying #182 locally seems to fix it.

xtexChooser commented 1 year ago

The same error on eowiki's latest dump. I solved by merging #313