InQuest / iocextract

Defanged Indicator of Compromise (IOC) Extractor.
https://inquest.readthedocs.io/projects/iocextract/
GNU General Public License v2.0
498 stars 91 forks source link

refang_url converts unknown schemes (such as 'tcp') to 'http' #32

Closed jekyc closed 1 year ago

jekyc commented 5 years ago

It seems that refang'ing urls with a scheme not listed in line: https://github.com/InQuest/python-iocextract/blob/4da913206d8e94a6a3b137c011c89e9707cb3966/iocextract.py#L626 replaces it with 'http': https://github.com/InQuest/python-iocextract/blob/4da913206d8e94a6a3b137c011c89e9707cb3966/iocextract.py#L631.

Maybe a hard-coded conversion mapping could be used, e.g.:

refang_schemes = {
    'http': ['hxxp'],
    'https': ['hxxps'],
    'ftp': ['ftx', 'fxp'],
    'ftps': ['ftxs', 'fxps']
}
for scheme, fanged in refang_schemes.items():
    if parsed.scheme in fanged:
        parsed = parsed._replace(scheme=scheme)
        url = parsed.geturl().replace(scheme + ':///', scheme + '://')

        try:
            _ = urlparse(url)
        except ValueError:
            # Last resort on ipv6 fail.
            url = url.replace('[', '').replace(']', '')

        parsed = urlparse(url)

        break

This is not as catch-all as the current solution, but on the other hand it does not alter the indicator.

Example:

In [1]: import iocextract                                                                              

In [2]: content = """tcp://example[.]com:8989/bad"""                                                   

In [3]: list(iocextract.extract_urls(content))                                                         
Out[3]: ['tcp://example[.]com:8989/bad', 'tcp://example[.]com:8989/bad']

In [4]: list(iocextract.extract_urls(content, refang=True))                                            
Out[4]: ['http://example.com:8989/bad', 'http://example.com:8989/bad']

Note: This behavior is shown in the output examples in the README.rst in the 'Usage' section related to refang.

battleoverflow commented 1 year ago

Hi, @jekyc!

I believe this was resolved in this commit: https://github.com/InQuest/python-iocextract/commit/9abe5f25f989d5342a739f1733fb8fd4d91156d0

I've set it up to allow the user to decide if a scheme check will even occur during execution. If this does not fix your issue, feel free to let me know. If you have the time, feel free to submit a PR for any improvements you think could be useful.

This new release is not available on PyPI yet, but I'll be sure to make another comment here once it's available.

You can see an example of the change in this issue: https://github.com/InQuest/python-iocextract/issues/34

battleoverflow commented 1 year ago

New version is now available on PyPI: https://pypi.org/project/iocextract/1.14.0/