binref / refinery

High Octane Triage Analysis
Other
635 stars 63 forks source link

New Unit that Wraps XLMMacroDeobfuscator #12

Closed baderj closed 2 years ago

baderj commented 2 years ago

For some Excel files, the unit xtvba (based on olevba) fails to extract Excel 4.0 macros, while the XLMMacroDeobfuscator works (example) . This module also potentially provides better deobfuscation results due to using an emulator.

The new unit just wraps the process_file function of XLMMacroDeobfuscator. Almost all commandline arguments to XLMMacroDeobfuscator are supported with the same defaults. Exceptions are:

Unfortunately, XLMMacroDeobfuscator has the following limitations that maybe need to be addressed:

  1. There is a print statement warning about pywin32 which can't be muted.
  2. XLMMacroDeobfuscator only works on files, and I could not get it to work with the virtual file system of binref, so the unit uses tempfile to write the file to disk.

Maybe it would also make sense to add the XLMMacroDeobfuscator functionality to xtvba rather than an new unit, but I opted for a new unit because:

  1. XLMMacroDeobfuscator works only for XLM, not Macros in general.
  2. XLMMacroDeobfuscator has many settings that would clutter xtvba
huettenhain commented 2 years ago

I do have some reservations against tempfile: I would like to give a certain guarantee to users that refinery units have no disk activity "side-effects". I would want to generally avoid ever writing anything to disk unless it is explicit, especially since it's for malware analysis.

The good news is, I think I fixed the issues with the virtual file system in 8dcd9c17eb01159ef7f85067487922d856ba2bc7, and it should work now. If you rebase your branch against that commit, I am quite sure it would work. I also uploaded your test sample to the test data repository. I would be very grateful if you could:

The following is a suggested patch to make these things happen:

diff --git a/refinery/units/formats/office/xlmdeobf.py b/refinery/units/formats/office/xlmdeobf.py
index 12319af..12f40e8 100644
--- a/refinery/units/formats/office/xlmdeobf.py
+++ b/refinery/units/formats/office/xlmdeobf.py
@@ -2,9 +2,8 @@
 # -*- coding: utf-8 -*-
 from __future__ import annotations

-import tempfile
-
 from refinery.units.formats import Unit
+from refinery.lib.vfs import VirtualFileSystem, VirtualFile

 class XLMMacroDeobfuscator(Unit):
@@ -78,7 +77,10 @@ class XLMMacroDeobfuscator(Unit):
             type=int,
             action="store",
             default=0,
-            help="Set the level of details to be shown (0:all commands, 1: commands no jump 2:important commands 3:strings in important commands).",
+            help=(
+                "Set the level of details to be shown (0:all commands, 1: commands no jump 2:important "
+                "commands 3:strings in important commands)."
+            ),
         ) = 0,
         timeout: Unit.Arg(
             "--timeout",
@@ -98,10 +100,9 @@ class XLMMacroDeobfuscator(Unit):
         return process_file

     def process(self, data: bytearray):
-        with tempfile.NamedTemporaryFile() as nf:
-            nf.write(data)
+        with VirtualFileSystem() as vfs:
             result = self._process_file(
-                file=nf.name,
+                file=VirtualFile(vfs, data),
                 noninteractive=True,
                 return_deobfuscated=True,
                 silent=True,
@@ -118,4 +119,4 @@ class XLMMacroDeobfuscator(Unit):
                 output_level=self.args.output_level,
                 timeout=self.args.timeout,
             )
-            return "\n".join(result).encode("utf-8")
+        return "\n".join(result).encode(self.codec)

With these changes and the test sample uploaded to the test data repository, the CI pipeline should pass without error and we can merge this.

baderj commented 2 years ago

Thanks for updating VFS. I agree that it is much nicer than having temp file written to disk (especially considering these files are often malware).

I rebased my branch, squashed the commits into one and added you patch. I messed up rebasing once, so that's why there is this reference going nowhere d6c2004.

codecov-commenter commented 2 years ago

Codecov Report

Merging #12 (930e829) into master (8dcd9c1) will increase coverage by 0.00%. The diff coverage is 92.30%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #12   +/-   ##
=======================================
  Coverage   84.70%   84.71%           
=======================================
  Files         260      261    +1     
  Lines       16363    16376   +13     
=======================================
+ Hits        13861    13873   +12     
- Misses       2502     2503    +1     
Impacted Files Coverage Δ
refinery/units/formats/office/xlmdeobf.py 92.30% <92.30%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 8dcd9c1...930e829. Read the comment docs.