Cisco-Talos / clamav

ClamAV - Documentation is here: https://docs.clamav.net
https://www.clamav.net/
GNU General Public License v2.0
4.13k stars 679 forks source link

WDB signatures to allow specific URLs regardless of display text #796

Open micahsnyder opened 1 year ago

micahsnyder commented 1 year ago

PDB signatures are a watch list to protect certain real-URL domains, by monitoring for suspicious links that say they go to those domains in the display-text, but actually go somewhere else. WDB signatures are an allow list to prevent phishing heuristics for trusted real-URL + display+URL combinations.

It would be useful to have WDB signatures to allow URLs to click-time protection domains where we are unable to verify what the final/actual domain is that the URL will resolve to.

Our current option is to allow a bunch of TLDs in a regex for the display text, like this:

X:(.+\.)?safelinks.protection.outlook.com:(.+\.)?.*\.(at|be|ca|ch|co\.uk|de|es|fr|ie|in|it|nl|ph|pl|com|com\.(au|cn|hk|my|sg))([/?].*)?

It would be nice to be able to do something like this instead, so we don't have to guess at all of the possible display domain TLD's:

Y:(.+\.)?safelinks.protection.outlook.com

Ref: https://github.com/Cisco-Talos/clamav/issues/771#issuecomment-1325136188

rzvncj commented 1 year ago

But how would we decide what constitutes a TLD? Mozilla keeps a list of TLDs. Do we have one in ClamAV?

Otherwise of course guessing at anything other than a one-word suffix (like ".com", or ".net") could get us in trouble, because we can't very well say that safelinks.protection.outlook.com should be allowed to redirect to safelinks.protection.outlook.my-malicious-domain.com.

micahsnyder commented 1 year ago

The intent is to trust going to the safelinks.protection.outlook.com domain regardless of what the display text says. We would trust safelinks.protection.outlook.com to determine if the site it redirects to is safe.
If we can't figure out where the link will redirect to, we can't override to say "well not not actually safe".

micahsnyder commented 1 year ago

We wouldn't want to trust safelinks.protection.outlook.com.my-malicious-domain.com, of course. So the whole FQDN would have to be a match.

So perhaps it needs the ^ and $ regex symbols, like: Y:^(.+\.)?safelinks.protection.outlook.com$

rzvncj commented 1 year ago

Ah, so the

X:(.+\.)?safelinks.protection.outlook.com:(.+\.)?.*\.(at|be|ca|ch|co\.uk|de|es|fr|ie|in|it|nl|ph|pl|com|com\.(au|cn|hk|my|sg))([/?].*)?

rule says "if the display text is safelinks.protection.outlook.de and it redirects to safelinks.protection.outlook.com it's fine", not the other way around. Sorry for the misunderstanding - yeah, that's a much easier problem to solve. :smile:

Sanesecurity commented 1 year ago

How many TLDs....

https://github.com/danielmiessler/SecLists/blob/master/Discovery/DNS/tlds.txt

micahsnyder commented 1 year ago

Ah, so the

X:(.+\.)?safelinks.protection.outlook.com:(.+\.)?.*\.(at|be|ca|ch|co\.uk|de|es|fr|ie|in|it|nl|ph|pl|com|com\.(au|cn|hk|my|sg))([/?].*)?

rule says "if the display text is safelinks.protection.outlook.de and it redirects to safelinks.protection.outlook.com it's fine", not the other way around. Sorry for the misunderstanding - yeah, that's a much easier problem to solve. 😄

Yup. That's the idea. I'd like an easy way to say that anything that redirects to safelinks.protection.outlook.com is fine. It should be a fairly easy problem to solve.

rzvncj commented 1 year ago

Y appears to be synonymous to X here. Do we now want it to only support the case without : (and break the previous use-case, where it could have been used instead of X)?

rzvncj commented 1 year ago

I'm guessing we want to allow both the string1:string2 case, and the string1 case for Y?

micahsnyder commented 1 year ago

Uh... I had no idea Y already exists. The coincidence that I suggested "Y" in an earlier comment has me chuckling.

It seems like it may do exactly what I was hoping for??? It's not in our documentation and we don't have any examples of it in the daily.wdb.

I found the commit where it was introduced and am just reading what the changes do: https://github.com/Cisco-Talos/clamav/commit/6e3332cfd97244ee358a81c8758fe49997f2c927

I will will have to do some testing. If it really does what I had been hoping for, we may be able to just update the documentation, add a couple feature tests, and close this ticket.

rzvncj commented 1 year ago

Uh... I had no idea Y already exists. The coincidence that I suggested "Y" in an earlier comment has me chuckling.

It seems like it may do exactly what I was hoping for??? It's not in our documentation and we don't have any examples of it in the daily.wdb.

I found the commit where it was introduced and am just reading what the changes do: 6e3332c

I will will have to do some testing. If it really does what I had been hoping for, we may be able to just update the documentation, add a couple feature tests, and close this ticket.

I'd be surprised if it actually works as-is, since this line from the original commit you've linked:

if(( rc = add_pattern(matcher,(const unsigned char*)pattern,flags, buffer[0] == 'Y') ))

no longer exists in the current code. That is, X and Y don't appear to be handled differently.

rzvncj commented 1 year ago

grep -IR root_regex_hostonly also returns nothing in the clamav source code directory.

rzvncj commented 10 months ago

Hi, any updates on this? Do you want to somehow revert to the old code that used to handle Y, or do we reimplement the functionality?

rzvncj commented 6 months ago

Ping? :)

micahsnyder commented 6 months ago

I'm sorry, I don't have anything else to report. We're down a team member and I haven't been able work on lower priority issues.

rzvncj commented 6 months ago

I understand. Sorry to hear about that. Update the issue here if and when you have time.