InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
498 stars 91 forks source link

Add the function --extract-domains and --extract-subdomains #72

Closed ZeroDot1 closed 9 months ago

ZeroDot1 commented 1 year ago

Sometimes it is necessary to simply extract the domains and or the domains and subdomains.

And a question, are the new longer domain extensions included?

giovino commented 1 year ago

See discussion in previous issue: Extract domain names without URI scheme

battleoverflow commented 9 months ago

I'd say this issue (https://github.com/InQuest/iocextract/issues/25) is definitely still the case for this request. I think using the custom regex route is the best method to achieve this for now. While we do have the URL extraction, which may potentially catch some domains, depending on how they're structured, it would most likely miss alot of valuable domains. I would recommend experimenting with different expressions to match the type of data you're extracting and plug it into iocextract through the custom regex option: https://inquest.readthedocs.io/projects/iocextract/en/latest/#custom-regex

There is actually an example in the documentation that shows a way to extract domains from ingested URLs. There will still be some trial and error, but it should be enough to get you started.