MISP / MISP

MISP (core software) - Open Source Threat Intelligence and Sharing Platform
https://www.misp-project.org/
GNU Affero General Public License v3.0
5.39k stars 1.4k forks source link

Free text import does not recognize new international Top Level domains #1149

Closed cbakkers closed 8 years ago

cbakkers commented 8 years ago

https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains domains with ending .top, .xyz for example recognized as filenames instead of domain/hostname. seen in MISP 2.4.44

rotanid commented 8 years ago

i noticed some regex matches regarding domains in the code yesterday and also thought immediatly, that some of them don't cover the new TLDs. the way i see it, that can't be worked around easily, as there are over a thousand TLDs which might overlap with filename extensions that people use. perhaps we could use a list of the top100 or so TLDs (in terms of registered domain names) insted of using all of them to work around this...

adulau commented 8 years ago

Indeed, good point. Not sure if we could directly use the TLDs list from the warning-list https://github.com/MISP/misp-warninglists/blob/master/lists/tlds/list.json which includes the IANA ones.

iglocska commented 8 years ago

We could indeed go with the warninglist!

rotanid commented 8 years ago

but as i said, there would be overlap with common file extensions like ZIP, which is also a TLD

iglocska commented 8 years ago

yup, think we just need to offer both options to the user and let him/her decide

cristianbell commented 8 years ago

talking with @cvandeplas about this resulted in a couple of ideas:

any input on this would be highly appreciated.

iglocska commented 8 years ago

Splitting it into smaller functions - yes that can be handy.

Reusing the validation methods - I disagree. The idea is not to pass each value through every (sometimes overly loose) validation function, but instead to use heuristics to run as few of the validation scripts as possible to get to the (group of) best fitting type(s). I would definitely treat the two separately.