fkie-cad / fact_extractor

Standalone Utility for FACT-like extraction
GNU General Public License v3.0
80 stars 31 forks source link

Extracting zip with generic carver produces wired results #123

Open maringuu opened 9 months ago

maringuu commented 9 months ago

Running fact_extractor with 0.zip gives me wired results. Here is the output of tree in the respective extraction directory:

.
├── files
│   └── 0.zip
├── input
│   └── 0.zip
└── reports
    └── meta.json

The report tells us that one file was extracted which is the file itself. They even have the same hashes.

What happened here?

maringuu commented 9 months ago

Looking at the code I just noticed that this is a binwalk issue.

maringuu commented 9 months ago

Actually I think this is our issue. Adding "--rm" to the binwalk invocation might be the solution (and works for my limited test cases).

jstucke commented 8 months ago

It seems to me this 0.zip being unpacked by the generic_carver (or rather not being detected as MIME type ZIP) is a bug in itself. As far as I can tell, the header starts with the usual magic string PK\x03\x04 but for whatever reason file detects it as application/octet-stream

jstucke commented 8 months ago

Actually I think this is our issue. Adding "--rm" to the binwalk invocation might be the solution (and works for my limited test cases).

I don't think this is (entirely) our fault. Unpacking the same file from itself is the fault of binwalk IMHO. Adding --rm works for this file but I tried it with a different file (which previously was unpacked successful with binwalk) and this causes the file to not be unpacked at all. The problem is probably that binwalk also does not recognize the file as zip and simply tries to carve files from the file and it finds a zip file at offset 0 (the file itself).

We could also try to handle this specific case in "fact_helper_file" and force the file to be detected as application/zip (the default application/zip unpacker has no problem unpacking the file). The file actually seems to be a OOXML file but that type does not come with a MIME definition in the standard file magic.

But is this a general problem with binwalk or is this a special case? Does this only affect zip files that are not detected as zip or also other files?