CybercentreCanada / assemblyline

AssemblyLine 4: File triage and malware analysis
https://cybercentrecanada.github.io/assemblyline4_docs/
MIT License
249 stars 15 forks source link

Identity: Python obfuscated code identified as text/plain #222

Closed kam193 closed 6 months ago

kam193 commented 6 months ago

Describe the bug I came across already a few similar files AssemblyLine isn't able to identify as Python code. The common thing is that the large part of the file is a base64-encoded variable, and there are just a few function calls.

I assume those cases can be difficult to properly identify, but in case you had an idea, two example files (zippy as password, be careful - all wants to do something more or less bad, so please don't run them).

main.py.zip (Type: text/plain Mimetype: text/plain Magic: ASCII text, with very long lines (65515), with CRLF line terminators) __decompiled_source.py.zip (Type: text/plain Mimetype: text/plain Magic: ASCII text, with very long lines (65515))

To Reproduce Steps to reproduce the behavior:

  1. Submit one of example files to AL
  2. Observe the filetype set by AL.

Expected behavior Files should be identified as code/python

Screenshots

Environment (please complete the following information if pertinent):

Additional context

gdesmar commented 6 months ago

I added the new executor to our current list. It is obviously a very flimsy approach as a single change to the exec line would stop our identification. If we start amassing enough executors, we'd want to generalize them with a better regex.

kam193 commented 6 months ago

There is a nice collection of executors from Datadog: https://github.com/DataDog/guarddog/blob/main/guarddog/analyzer/sourcecode/exec-base64.yml But I don't have any real examples to say how the type recognition is doing.

However, I'd suggest adding another one to the list: pickle.loads(zlib.decompress( An example file: text.zip (password: zippy, and as always, be careful, it comes from some real case - although I think it doesn't work).

gdesmar commented 6 months ago

The PR was merged. The updated Identify code should be part of the next release! Just make sure to backup your local change before reverting to get the latest at that point. 🙂 Thank you for the help! EDIT: And I added a lot of the items from the Datadog link, plus the pickle one, so those should be handled as well!

kam193 commented 6 months ago

Thank you!