ArchiveTeam / wpull

Wget-compatible web downloader and crawler.
GNU General Public License v3.0
556 stars 77 forks source link

ftp crash: sre_constants.error: bad character range #274

Open espes opened 9 years ago

espes commented 9 years ago
INFO Fetched ‘ftp://ftp.maltedmedia.com/coco/SOFTWARE/COLLECTIONS/Briza's%20Collection/Ark%20Royal/’: 226 Listing completed.. Length: 1308.
ERROR Fatal exception.
Traceback (most recent call last):
  File "/root/wpull/wpull/app.py", line 128, in run
    yield From(self._builder.factory['Engine']())
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 251, in _step
    result = coro.throw(exc)
  File "/root/wpull/wpull/engine.py", line 278, in __call__
    yield From(self._run_workers())
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 253, in _step
    result = coro.send(value)
  File "/root/wpull/wpull/engine.py", line 67, in _run_workers
    task.result()
  File "/usr/local/lib/python3.4/dist-packages/trollius/futures.py", line 287, in result
    raise self._exception
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 251, in _step
    result = coro.throw(exc)
  File "/root/wpull/wpull/engine.py", line 146, in _run_worker
    yield From(self._process_item(item))
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 251, in _step
    result = coro.throw(exc)
  File "/root/wpull/wpull/engine.py", line 327, in _process_item
    yield From(self._process_url_item(url_record))
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 251, in _step
    result = coro.throw(exc)
  File "/root/wpull/wpull/engine.py", line 384, in _process_url_item
    yield From(self._processor.process(url_item))
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 251, in _step
    result = coro.throw(exc)
  File "/root/wpull/wpull/processor/delegate.py", line 29, in process
    raise Return((yield From(self.ftp_processor.process(url_item))))
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 251, in _step
    result = coro.throw(exc)
  File "/root/wpull/wpull/processor/ftp.py", line 133, in process
    raise Return((yield From(session.process())))
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 251, in _step
    result = coro.throw(exc)
  File "/root/wpull/wpull/processor/ftp.py", line 188, in process
    wait_time = yield From(self._fetch(request, is_file))
  File "/usr/local/lib/python3.4/dist-packages/trollius/tasks.py", line 253, in _step
    result = coro.send(value)
  File "/root/wpull/wpull/processor/ftp.py", line 334, in _fetch
    self._handle_response(request, response)
  File "/root/wpull/wpull/processor/ftp.py", line 420, in _handle_response
    self._add_listing_links(response)
  File "/root/wpull/wpull/processor/ftp.py", line 362, in _add_listing_links
    not fnmatch.fnmatchcase(file_entry.name, self._glob_pattern):
  File "/usr/lib/python3.4/fnmatch.py", line 70, in fnmatchcase
    match = _compile_pattern(pat)
  File "/usr/lib/python3.4/functools.py", line 452, in wrapper
    result = user_function(*args, **kwds)
  File "/usr/lib/python3.4/fnmatch.py", line 46, in _compile_pattern
    return re.compile(res).match
  File "/usr/lib/python3.4/re.py", line 219, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.4/re.py", line 288, in _compile
    p = sre_compile.compile(pattern, flags) 
  File "/usr/lib/python3.4/sre_compile.py", line 465, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.4/sre_parse.py", line 746, in parse
    p = _parse_sub(source, pattern, 0)
  File "/usr/lib/python3.4/sre_parse.py", line 358, in _parse_sub
    itemsappend(_parse(source, state))
  File "/usr/lib/python3.4/sre_parse.py", line 504, in _parse
    raise error("bad character range")
sre_constants.error: bad character range
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
chfoo commented 9 years ago

What URL did you pass to Wpull? It appears you passed an invalid glob pattern. If you didn't mean to use a glob pattern, add --no-glob.

The appears that glob patterns are not validated at the beginning.

espes commented 9 years ago

I believe it was just ftp://ftp.maltedmedia.com with --recursive, but I don't remember exactly

JustAnotherArchivist commented 1 year ago

Looks like ArchiveBot job cwyoxtvdmgkleq62knllx515g just reproduced this:

226 OK ftp://therealone78.ddns.net/html/music.git/MARETU%20ft.%20Hatsune%20Miku%20-%20Tool%20[T-Two-Tool]%20-%20EXTENDED.mp3
ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/application/app.py", line 157, in run
    yield from pipeline.process()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
    yield from self._process_one_worker()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
    task.result()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
    item = yield from self.process_one(_worker_id=worker_id)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
    yield from task.process(item)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/application/tasks/download.py", line 492, in process
    yield from session.app_session.factory['Processor'].process(session)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/processor/delegate.py", line 29, in process
    return (yield from processor.process(item_session))
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/processor/ftp.py", line 100, in process
    return (yield from session.process())
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/processor/ftp.py", line 151, in process
    wait_time = yield from self._fetch(request, is_file)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/processor/ftp.py", line 313, in _fetch
    self._handle_response(request, response)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/processor/ftp.py", line 391, in _handle_response
    self._add_listing_links(response)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20220601/lib/python3.6/site-packages/wpull/processor/ftp.py", line 339, in _add_listing_links
    not fnmatch.fnmatchcase(file_entry.name, self._glob_pattern):
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/fnmatch.py", line 70, in fnmatchcase
    match = _compile_pattern(pat)
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/fnmatch.py", line 46, in _compile_pattern
    return re.compile(res).match
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/re.py", line 233, in compile
    return _compile(pattern, flags)
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/sre_parse.py", line 765, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/sre_parse.py", line 553, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
sre_constants.error: bad character range o-T at position 48

It looks like the remote filename gets interpreted as a regex...‽ What could possibly go wrong? :-)

JustAnotherArchivist commented 1 year ago

That was from the ArchiveBot dashboard, which didn't actually show the last fetch:

2023-09-08 07:58:38,909 - wpull.processor.ftp - INFO - Fetching ‘ftp://therealone78.ddns.net/html/music.git/MARETU%20ft.%20Hatsune%20Miku%20-%20Magical%20Doctor%20(%E3%83%9E%E3%82%B8%E3%82%AB%E3%83%AB%E3%83%89%E3%82%AF%E3%82%BF%E3%83%BC).mp3’.
2023-09-08 07:58:40,518 - wpull.processor.ftp - INFO - Fetched ‘ftp://therealone78.ddns.net/html/music.git/MARETU%20ft.%20Hatsune%20Miku%20-%20Magical%20Doctor%20(%E3%83%9E%E3%82%B8%E3%82%AB%E3%83%AB%E3%83%89%E3%82%AF%E3%82%BF%E3%83%BC).mp3’: 226 Transfer complete.. Length: 6828032.
2023-09-08 07:58:40,878 - wpull.processor.ftp - INFO - Fetching ‘ftp://therealone78.ddns.net/html/music.git/’.
2023-09-08 07:58:41,469 - wpull.processor.ftp - INFO - Fetched ‘ftp://therealone78.ddns.net/html/music.git/’: 226 Directory send OK.. Length: 50982.
2023-09-08 07:58:41,475 - wpull.application.app - ERROR - Fatal exception.
Traceback (most recent call last):
  <same as above>