Closed cbakkers closed 8 years ago
i noticed some regex matches regarding domains in the code yesterday and also thought immediatly, that some of them don't cover the new TLDs. the way i see it, that can't be worked around easily, as there are over a thousand TLDs which might overlap with filename extensions that people use. perhaps we could use a list of the top100 or so TLDs (in terms of registered domain names) insted of using all of them to work around this...
Indeed, good point. Not sure if we could directly use the TLDs list from the warning-list https://github.com/MISP/misp-warninglists/blob/master/lists/tlds/list.json which includes the IANA ones.
We could indeed go with the warninglist!
but as i said, there would be overlap with common file extensions like ZIP, which is also a TLD
yup, think we just need to offer both options to the user and let him/her decide
talking with @cvandeplas about this resulted in a couple of ideas:
any input on this would be highly appreciated.
Splitting it into smaller functions - yes that can be handy.
Reusing the validation methods - I disagree. The idea is not to pass each value through every (sometimes overly loose) validation function, but instead to use heuristics to run as few of the validation scripts as possible to get to the (group of) best fitting type(s). I would definitely treat the two separately.
https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains domains with ending .top, .xyz for example recognized as filenames instead of domain/hostname. seen in MISP 2.4.44