Charcoal-SE / SmokeDetector

Headless chatbot that detects spam and posts links to it in chatrooms for quick deletion.
https://metasmoke.erwaysoftware.com
Apache License 2.0
472 stars 182 forks source link

Watched/blacklisted IP should trigger on IP-only URLs #3722

Open tripleee opened 4 years ago

tripleee commented 4 years ago

What problem has occurred? What issues has it caused?

https://metasmoke.erwaysoftware.com/post/220864 should have triggered "watched IP" because 199.192.31.7 is currently watched; but it did not.

What would you like to happen/not happen?

IP addresses in URLs should be checked against the CIDR lists.

stale[bot] commented 4 years ago

This issue has been closed because it has had no recent activity. If this is still important, please add another comment and find someone with write permissions to reopen the issue. Thank you for your contributions.

user12986714 commented 4 years ago

This is copy pasted from my comment on PR 3789 with minor changes So the issue looks like that raw ip addresses in posts are not currently subject to ip or asn list checks. The nature of those two problems appears to be the same. Take ip watchlist for example. As we can see, in findspam.py, in function watched_ip_for_url_hostname(), it calls ip_for_url_host():

return ip_for_url_host(s, site, GlobalVars.watched_cidrs)

Then in ip_for_url_host(), it calls dns_query():

        a = dns_query(hostname, 'a')
        if a is not None:
            # Do something

The problem is, if hostname is a raw ip, dns_query() will return None (cannot resolve).

    try:
        starttime = datetime.utcnow()
        answer = dns.resolver.query(label, qtype)
    except dns.exception.DNSException as exc:
        # Some logging
        return None

Hence all raw ip won't be inspected by that rule.
A way to fix this is to detect raw ip address in dns_query() (with a regex like [1-2]?[0-9]?[0-9]\.[1-2]?[0-9]?[0-9]\.[1-2]?[0-9]?[0-9]\.[1-2]?[0-9]?[0-9], and if detected, construct an answer with that ip, rather than treating it as a domain and hence fail to resolve.

tripleee commented 3 years ago

I have a fork with a basic implementation of this but it's a little bit tortured. I'm thinking it would make sense to reduce code duplication by (finally) creating a class for parsed URLs with features like whitelisted, IP only, etc.

https://github.com/tripleee/SmokeDetector/tree/ip-only-watch-check