InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
498 stars 91 forks source link

Failed to extract the URLs from this tweet #42

Closed shankaraman closed 4 years ago

shankaraman commented 4 years ago

The URLs from this tweet were not extracted by the tool.

I tried all the methods : extract_urls, extract_unencoded_urls, extract_encoded_urls , none of them worked. Is there a way to fix it ?

cmmorrow commented 4 years ago

Hello @shankaraman, I think the problem is because the tweet has defanging brackets around the colon ":" character which is not in the iocextract refanging regular expression for URLs. I can work on adding this pattern to the regular expression.

cmmorrow commented 4 years ago

Hello @shankaraman, I have a fix merged to master. You can check to see if this works by pulling the master branch and running iocextract.py. This change will be included in a future release.

shankaraman commented 4 years ago

Thank you very much @cmmorrow , I shall test the latest commit and get back to you with a feedback. And thanks for your time too!

shankaraman commented 4 years ago

Hello @cmmorrow , please check the 1st column. I think the line number 590 in the iocextract.py file requires a modification. Thanks! Attaching the screenshot for reference.

Screenshot from 2020-05-25 16-23-30

cmmorrow commented 4 years ago

Thanks for catching that @shankaraman. I've corrected the issue and pushed the change to master. Please review and let me know if you notice any problems.

shankaraman commented 4 years ago

Works like a charm!

Thanks!