Closed battleoverflow closed 1 year ago
Thanks for taking my comment into account! Hopefully this can be fixed (:
Hi, @luis261!
I finally got a second to look over the issue. Your comment was absolutely valuable, but time is unfortunately limited, so I wasn't able to really look into it until now. A solution is currently in testing and will be available in the next release. I've included a few examples with comments below.
You may notice a new parameter: defang_data
. This way if you extract a URL or IP address that isn't defanged, you can immediately defang it during extraction a little easier. I still have some things to prepare before this release is ready, but I'm planning for this week. I'll make another comment on this thread once it's available for download!
import iocextract
data = [
"1.1.1.1",
"1[.]1[.]1[.]1",
"domain.com",
"domain[.]com"
]
for d in data:
# Everything should be refanged
print(list(iocextract.extract_urls(d, refang=True, no_scheme=True)))
# Half should be defanged, half should be normal (defang_data defaults to false)
print(list(iocextract.extract_urls(d, refang=False, no_scheme=True)))
# Everything should be defanged
print(list(iocextract.extract_urls(d, refang=False, no_scheme=True, defang_data=True)))
@azazelm3dj3d Alright, thanks for keeping me updated! Once the new release is out I will check out the new behavior of extract_urls
The new version is now available: https://pypi.org/project/iocextract/1.14.1/
Alright, I verified the behavior you wrote about in your comment. However, the fundamental issue of extract_urls
pulling in IPs still exists, now it even seems to be the universal behavior (as opposed to it occuring just in certain edge cases). That is just not what I'd expect after reading the documentation, considering that extract_ips
exists as well ... and extract_urls
is described in the documentation as extracting URLs (IPs are not mentioned)
Definitely a good note for the future. Due to the repository not having too many outstanding issues relative to other open-source initiatives, I haven't taken much time to review the actual documentation and how thorough (or accurate) it is. I do have it on my backlog, but no issue assignment, so I just took care of that. Thank you for bringing that to my attention.
Issue: https://github.com/InQuest/python-iocextract/issues/65
"while it seems like the bug originally referenced in this issue is fixed in the new version, the one I commented above still exists. Defanged IPs still get extracted by
extract_urls
while their non-defanged counterparts don't"Issue comment: https://github.com/InQuest/python-iocextract/issues/34#issuecomment-1381856822