Considerations for base64 arguments

pooki3bear commented 2 years ago

Hi, I just saw this awesome project!

My first thought "how will this handle expected base64 argument for program like Chrome or Nvidia?"

Suggestion: Include base64 decode process for arguments before vectorization. This might be finally represented as a string of the valid ascii bytes in cases where binary values are passed. This will most likely also help model accuracy in the case of base64 encoded commands.

Expected Behaviour

Google Chrome Helper (GPU) output label 0

Actual Behaviour

Google Chrome Helper (GPU) output label 1

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Find chrome helper GPU process with args ps -e |grep Chrome |grep GPU Add process string to example code from project, run demo script and output 1.

If I remove the base64 string from the submitted command, the model returns 0 as expected.

tiberiu44 commented 2 years ago

Hi @pooki3bear ,

BASE64 is still a type of obfuscation, so the result should be 1. This is why we cannot return 0 for BASE64 strings. I'm guessing that Google Chrome is actually an exception for you and you don't want it highlighted you should ignore by whitelisting. I'm sure that if a powerscript runs a BASE64 encoded script you will want to see it get flagged.

Hope this helps

pooki3bear commented 2 years ago

Hi @tiberiu44,

In my limited experience there are legitimate uses for windows admins to submit base64 encoded powershell scripts (like in the case of win-rm based tooling)

In the case of invoke-obfuscation, the CaP1TaLiZatIoN and character frequency artifacts might be more valuable for detecting something that was intended to be hidden.

tiberiu44 commented 2 years ago

I agree with you on what you said. However, BASE64 also has illegitimate usage. In fact all obfuscation mechanisms are generally divided between hiding malicious intent or protecting intelectual property/sensitive data. This tool detects obfuscation, not obfuscation for malicious intent. If you are looking for malicious activity, you should focus on other indicators:

public/private IPs
FQDN from sketchy providers or dynamic DNS
you could decode BASE64 and look inside for clues
check the behaviour of the script in a sandbox
look for bash/sh/powershell pipes and redirects
check for the occurrence rate of that command on the system
check if this is a newly observed command

For this type of operations we do provide other tools:

https://github.com/adobe/OSAS - for building system behaviour models (this will only work for large datasets)
https://github.com/adobe/stringlifier - for detecting portions of text that contain random strings, JWT tokens, GUIDS and numbers - this is useful for cleanup purposes. It will probably mark the base64 string as a "RANDOM STRING"
https://github.com/adobe/libLOL - for detecting Living of the Land attacks (it's probably what you are looking for) - it works pretty ok for Linux boxes. Powershell/Windows is still work in progress and contributions are welcomed.

adobe / obfuscation-detection