barseghyanartur / tld

Extracts the top level domain (TLD) from the URL given.
https://pypi.python.org/pypi/tld
Other
179 stars 36 forks source link

process_url will raise exception in some situations #114

Closed LASER-Yi closed 1 year ago

LASER-Yi commented 2 years ago

This issue is found when processing subtitles

To reproduce:

>>> from tld.utils import process_url
>>> process_url(':{\\rgit}https://github.com', fail_silently=True, fix_protocol=True)

The following exception will be raised

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/lib/python3.9/site-packages/tld/utils.py", line 313, in process_url
    parsed_url = urlsplit(url)
  File "/opt/homebrew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 489, in urlsplit
    _checknetloc(netloc)
  File "/opt/homebrew/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/parse.py", line 434, in _checknetloc
    raise ValueError("netloc '" + netloc + "' contains invalid " +
ValueError: netloc ':{\rgit}https:' contains invalid characters under NFKC normalization
morpheus65535 commented 1 year ago

Does #119 fix this?

barseghyanartur commented 1 year ago

Merged into master. Working on updating the GitHub CI. Will release one day soon.

barseghyanartur commented 1 year ago

Released in 0.12.7.