Closed baderj closed 2 years ago
I do have some reservations against tempfile
: I would like to give a certain guarantee to users that refinery units have no disk activity "side-effects". I would want to generally avoid ever writing anything to disk unless it is explicit, especially since it's for malware analysis.
The good news is, I think I fixed the issues with the virtual file system in 8dcd9c17eb01159ef7f85067487922d856ba2bc7, and it should work now. If you rebase your branch against that commit, I am quite sure it would work. I also uploaded your test sample to the test data repository. I would be very grateful if you could:
vfs
andThe following is a suggested patch to make these things happen:
diff --git a/refinery/units/formats/office/xlmdeobf.py b/refinery/units/formats/office/xlmdeobf.py
index 12319af..12f40e8 100644
--- a/refinery/units/formats/office/xlmdeobf.py
+++ b/refinery/units/formats/office/xlmdeobf.py
@@ -2,9 +2,8 @@
# -*- coding: utf-8 -*-
from __future__ import annotations
-import tempfile
-
from refinery.units.formats import Unit
+from refinery.lib.vfs import VirtualFileSystem, VirtualFile
class XLMMacroDeobfuscator(Unit):
@@ -78,7 +77,10 @@ class XLMMacroDeobfuscator(Unit):
type=int,
action="store",
default=0,
- help="Set the level of details to be shown (0:all commands, 1: commands no jump 2:important commands 3:strings in important commands).",
+ help=(
+ "Set the level of details to be shown (0:all commands, 1: commands no jump 2:important "
+ "commands 3:strings in important commands)."
+ ),
) = 0,
timeout: Unit.Arg(
"--timeout",
@@ -98,10 +100,9 @@ class XLMMacroDeobfuscator(Unit):
return process_file
def process(self, data: bytearray):
- with tempfile.NamedTemporaryFile() as nf:
- nf.write(data)
+ with VirtualFileSystem() as vfs:
result = self._process_file(
- file=nf.name,
+ file=VirtualFile(vfs, data),
noninteractive=True,
return_deobfuscated=True,
silent=True,
@@ -118,4 +119,4 @@ class XLMMacroDeobfuscator(Unit):
output_level=self.args.output_level,
timeout=self.args.timeout,
)
- return "\n".join(result).encode("utf-8")
+ return "\n".join(result).encode(self.codec)
With these changes and the test sample uploaded to the test data repository, the CI pipeline should pass without error and we can merge this.
Thanks for updating VFS. I agree that it is much nicer than having temp file written to disk (especially considering these files are often malware).
I rebased my branch, squashed the commits into one and added you patch. I messed up rebasing once, so that's why there is this reference going nowhere d6c2004.
Merging #12 (930e829) into master (8dcd9c1) will increase coverage by
0.00%
. The diff coverage is92.30%
.
@@ Coverage Diff @@
## master #12 +/- ##
=======================================
Coverage 84.70% 84.71%
=======================================
Files 260 261 +1
Lines 16363 16376 +13
=======================================
+ Hits 13861 13873 +12
- Misses 2502 2503 +1
Impacted Files | Coverage Δ | |
---|---|---|
refinery/units/formats/office/xlmdeobf.py | 92.30% <92.30%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 8dcd9c1...930e829. Read the comment docs.
For some Excel files, the unit
xtvba
(based on olevba) fails to extract Excel 4.0 macros, while the XLMMacroDeobfuscator works (example) . This module also potentially provides better deobfuscation results due to using an emulator.The new unit just wraps the
process_file
function ofXLMMacroDeobfuscator
. Almost all commandline arguments to XLMMacroDeobfuscator are supported with the same defaults. Exceptions are:silent
: This is always set totrue
, so the unit does not print the result to stdout.return_deobfuscated
: To get the results returned.noninteractive
: This is set totrue
for obvious reasons.Unfortunately, XLMMacroDeobfuscator has the following limitations that maybe need to be addressed:
print
statement warning aboutpywin32
which can't be muted.tempfile
to write the file to disk.Maybe it would also make sense to add the XLMMacroDeobfuscator functionality to
xtvba
rather than an new unit, but I opted for a new unit because:xtvba