Closed Ben-Steele closed 1 year ago
^ This is not a valid URL, but some applications with url encode it and follow the link.
Hi, @Ben-Steele!
The ability to control the end punctuation should now be finished.
If you are using iocextract as a library, you can remove the punctuation restriction like this:
import iocextract
def rm_puncutation():
for url in iocextract.extract_urls("https://www.mysite.com/endpoint?param=abc--~C<http://anothersite.com/myfile.zip>", refang=True, open_punc=True):
print(url)
rm_puncutation()
If you're using it as a CLI, this command will do the same thing:
iocextract --input urls.txt --extract-urls --open
A new version is not available yet on PyPI. I will post another comment here once a new version is available for download.
The new PyPI package is now available!
PyPI: https://pypi.org/project/iocextract/1.13.8/ GitHub Releases: https://github.com/InQuest/python-iocextract/releases/tag/v1.13.8
The url is:
https://www.mysite.com/endpoint?param=abc--~C<http://anothersite.com/myfile.zip>
the trailing
>
is always stripped off the url even through it is part of it. When I extract_iocs I get:https://www.mysite.com/endpoint?param=abc--~C<http://anothersite.com/myfile.zip
I can give the real url that I discovered this issue with, but it is malicious so I didn't want to include it here.