Closed deadbits closed 5 years ago
Well.. my regex above was close to working in Python. Not quite how I thought though.
Unfortunately there are no plans to add domain support at the moment. We explicitly left out domain extraction because the false positives are extremely high.
My recommendation here would be to use the custom regex support to add any regexes you'd like to use. This is supported by both the CLI and the library. Let me know if you have any questions getting that set up.
Sounds good to me. I'll probably rely on the custom regex for the time being.... It's trickier than I thought..
Thanks for the quick response! On Dec 6, 2018, 3:35 PM -0500, Ryan Shipp notifications@github.com, wrote:
Unfortunately there are no plans to add domain support at the moment. We explicitly left out domain extraction because the false positives are extremely high. My recommendation here would be to use the custom regex support to add any regexes you'd like to use. This is supported by both the CLI and the library. Let me know if you have any questions getting that set up. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
I was trying to pull out a list of domains from a text file input (sample of input / expected output below), but iocextract doesn't recognize anything without a URI scheme I think.
Is it possible to include an --extract-domains, or have --extract-urls optionally ignore the scheme for instance? Just random thoughts, not sure the best way to handle this given how complicated the regex is.
If it's any help, this pattern
([a-zA-Z0-9-_]+(\.)+)?([a-z0-9-_]+)*\.+[a-z]{2,63}
should match pretty much any domain name up to the TLD.matches:
Sample Input
Was hoping to get output of:
supportXMR.com
xmrpool.net
monero.hashvault.pro
minergate.com