TeamHG-Memex / autopager

Detect and classify pagination links
99 stars 25 forks source link

ValueError: Invalid IPv6 URL #6

Open lopuhin opened 4 years ago

lopuhin commented 4 years ago

Here is the traceback

  File "/usr/local/lib/python3.6/dist-packages/autopager/autopager.py", line 51, in extract
    return list(get_shared_autopager().extract(page, direct, prev, next))
  File "/usr/local/lib/python3.6/dist-packages/autopager/autopager.py", line 112, in extract
    xseq = page_to_features(links)
  File "/usr/local/lib/python3.6/dist-packages/autopager/model.py", line 129, in page_to_features
    features = [link_to_features(a) for a in xseq]
  File "/usr/local/lib/python3.6/dist-packages/autopager/model.py", line 129, in <listcomp>
    features = [link_to_features(a) for a in xseq]
  File "/usr/local/lib/python3.6/dist-packages/autopager/model.py", line 55, in link_to_features
    p = urlsplit(href)
  File "/usr/lib/python3.6/urllib/parse.py", line 436, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL

Example which triggers the issue (I don't have the one which actually happened in the wild):

autopager.extract('<a href="http://[">Error</a>')