barseghyanartur / tld

Extracts the top level domain (TLD) from the URL given.
https://pypi.python.org/pypi/tld
Other
179 stars 36 forks source link

Invalid tuple for second-level TLD as input #95

Open elceef opened 3 years ago

elceef commented 3 years ago
>>> tld.__version__
'0.12.3'
>>> parse_tld('co.uk', fix_protocol=True)
('co.uk', 'co.uk', '')
# desired: ('uk', 'co', '')
>>> parse_tld('gov.pl', fix_protocol=True)
('gov.pl', 'gov.pl', '')
# desired: ('pl', 'gov', '')
barseghyanartur commented 3 years ago

That behaviour is correct.

elceef commented 3 years ago

Please note that:

barseghyanartur commented 3 years ago

@elceef:

I think you don't understand the concept behind the TLD list. Since co.uk is in the list, it is considered a TLD. The only thing we separate here is public and private TLDs.

from tld import parse_tld                                                                                                                                   

parse_tld('co.uk', fix_protocol=True)                                                                                                                       
# ('co.uk', 'co.uk', '')

parse_tld('hey.uk', fix_protocol=True)                                                                                                                      
# ('uk', 'hey', '')
elceef commented 3 years ago

Respectfully, I think you miss the point being that both uk and co.uk are TLD, both are present in the list and uk precedes co.uk.

barseghyanartur commented 3 years ago

@elceef:

Your views slightly clash with an issue reported (and fixed) earlier https://github.com/barseghyanartur/tld/issues/51

tld version 0.9.6 did behave as you now wish it would.

Let's discuss this further.

elceef commented 3 years ago

Input is inconsistent with the output:

>>> '.'.join(parse_tld('co.uk', fix_protocol=True))
'co.uk.co.uk.'

In this case and similar uk should have higher priority over co.uk and other second-level domains.