Open troyhunt opened 9 months ago
A trailing period in a domain is valid. It represents the root hierarchy, and is usually assumed and not printed.
Is that the case for either a publicly addressable domain accessed over HTTP or the domain part of an email address? Happy to be proven wrong, but I don't think there's a single website address that has a period immediately after the TLD, is there?
It is one of those things that may have been required in the early days of the internet when subdomains were used without the TLD being specified, but has become so common to use the full domain and omit the root hierarchy that it is now the default. It may be a better approach to remove it if present before reporting.
Yeah, thought that might be the case. The decision we keep running into with this project is strict adherence to RFCs versus filtering out junk. If I was to look at the occurrences of periods at the end of domains in everything from domain searches to breach parsing to people signing up to notifications, I bet 100% of them will be parsing or user input errors rather than compliance with an RFC edge case! I'm also of the view that if you could legitimately put a period at the end of say, an address, you'd be blocked in so many places you'd quickly stop doing that!
That is a valid view, and related to the other issues where emails may start with { or /, but in real life the users impacted by a data breach will never have that. I would be in favour of stripping them per how we use them on the internet today rather than strict RFC compliance.
@troyhunt is this still an issue? I just created a dummy file with a period at the end - user@123.com. - and the extractor removed the period as it wasn't present in the output file.
Yep, still an issue. In fact, this is our only failing test right now: https://github.com/HaveIBeenPwned/EmailAddressExtractor/blob/7bf5312f5600cda5e4c746c6ece4c4168fbbbef3/test/LegacyTests.cs#L168
Interesting, OK I'll investigate further.
I just added a failing test for this (DomainEndingInPeriodIsInvalid) after finding some junk that had made its way through.