VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
8.13k stars 1.42k forks source link

Base64 possible problem with special characters #1899

Closed djlukic closed 1 year ago

djlukic commented 1 year ago

Hi,

I played around with some malware and base64 detection and I spotted one problem. EML (email) samples are available here:

https://isc.sans.edu/diary/obama224+distribution+Qakbot+tries+vhd+virtual+hard+disk+images/29294

Emails contain Base64 encoded attachment by default if I recall correctly.

I wanted to do this kind of detection:

$ = "text/javascript" wide ascii base64 base64wide

Problem is that it detects only 6 out of 13 emails, but they all contain that same string once you decode with Base64.

I tried some variations and this would detect them all: $ = "text/" wide ascii base64 base64wide

but these wouldn't $ = "/javascript" wide ascii base64 base64wide $ = "javascript" wide ascii base64 base64wide

What could be a problem here? Is it the new lines, special characters or some kind of base64 imperfection?

Thanks!

plusvic commented 1 year ago

Invoking our base64 expert @wxsBSD.

wxsBSD commented 1 year ago

Sorry, meant to look at this yesterday when I saw it come in but got sidetracked on something else. I took a look at one of the files and this is indeed a newline issue.

This is the rule I was testing:

rule a {
  strings:
    $a = "text/javascript" base64
  condition:
    $a
}

This is the file I was testing:

9973f90d76dc3b449b676ead6ac5c2a7acd00b0d8fdab1c2ea91961ae71607c5  2022-12-01-obama224-Qakbot-193636-UTC.eml

The file does not match the rule, as you stated.

These are the base64 encoded versions of the string it is searching for:

dGV4dC9qYXZhc2NyaXB0
RleHQvamF2YXNjcmlwd
0ZXh0L2phdmFzY3JpcH

(For those that are into debugging, you can get this by uncommenting https://github.com/VirusTotal/yara/blob/master/libyara/base64.c#L471)

The string RleHQvamF2YXNjcmlw (notice the missing d) is in the file but the d is on the next line:

ICA8L2Rpdj4NCiAgICAgICA8L2Rpdj4NCiAgICA8c2NyaXB0IHR5cGU9InRleHQvamF2YXNjcmlw
dCI+DQoNCgkJdmFyIGExID0gJ1VFc0RCQlFBQUFBQUFER2xnVlVBQUFBQUFBQUFBQUFBQUFBSUFB

Sadly I don't think this can be worked around in YARA.

djlukic commented 1 year ago

Thank you for answering. What is your suggestion for such rules, matching content that's not close to a newline?

wxsBSD commented 1 year ago

You can try different content but you really have no control over where they encode to relative to the new lines. If you really must match them then pre-process the content (say with a python module that understands mime). This way the content should end up in the clear too.