Closed danielendresz closed 12 months ago
Hi.
The list of available TLDs changes regularly, so it would be a challenge to ensure the library has an up-to-date list on each call.
One option could be to perform a NS DNS query and check if there are associated name servers for that particular TLD. This would always be up-to-date, but wouldn't work for revoked TLDs.
Another option could be to save the TLDs from https://www.iana.org/domains/root/db into a list saved in your package, which can then be used to check even revoked TLDs against. If I understand it correctly, that list would only need to be updated, when a new TLD gets added. The last time a new TLD was added was as in April 2022, see https://www.iana.org/reports and search for "Delegation" or https://newgtlds.icann.org/en/program-status/delegated-strings.
An NS check is an interesting idea that could speed up bulk validations (instead of full dns checks) but I'm not sure how helpful it really is since it will miss a lot of invalid domains below the tld level. And as you said it doesn't solve the original problem. It's hard for me to see a meaningful use case for it.
The second option makes sense but keeping a file up to date just doesn't sound fun for me as a maintainer, and complex to be 100% accurate all the time.
I'd be more open to adding an option for the caller to supply a list of valid TLDs and a utility function for retrieving the current list.
I know this isn't very helpful but it might make sense to just do a TLD check on your side. Just lowercase the address and check the ending.
The list of available TLDs changes regularly, so it would be a challenge to ensure the library has an up-to-date list on each call.
Consider using tldextract
which uses Mozilla's Public Suffix List. Although it does require network access on the first call (at least if you haven't manually cached the list), it does avoid the need for DNS lookups on every request.
That's a great idea. Since it would be easy to do outside of this library, I'm inclined to not try to put this into this library. The main validate_email method returns an object that holds the domain portion of the email address, so it could be passed to tldextract easily enough.
It feels like it should be implemented as part of the globally_deliverable
argument (aside: this argument is missing from the README) - link to source.
I get what you're saying but I don't think it's a good fit for a syntax check.
I have a list of old email addresses which I would like to validate. However, since many of the domains used for those email addresses are no longer in use and no longer have DNS records, I would get an error if I keep check_deliverability to True.
However, after I disabled the delivery check, a lot of emails came through which could have never been valid. The only problem here is the top level domains.
.random
has never been an official top level domain recognized by ICANN, so I should have never been able to send an email to that address. However, since I can create my own mail server in my internal network with a custom top level domain, this check should not be mandatory.So additionally to check_deliverability, check_tld would be great, set to True by default. And as a source, I think that file (https://data.iana.org/TLD/tlds-alpha-by-domain.txt) from IANA is great. However, I am not 100% sure if all ever active TLDs are present in this file.