Closed EricFaehrmann closed 9 months ago
If I understand correctly, you are synthesizing an artificial EOCD to make it appear as though the file is an archive containing the files from both ZIP files. This is quite clever, but it is also very opaque, i.e. when I run xtzip
against an input I have no idea whether I am seeing synthesized output or the actual contents of a regular ZIP archive. I feel bad saying no to a contribution, especially a clever one, but I think I would prefer something along the following lines:
xtzip
unit checks against ambiguous input (i.e. multipled EOCDs) and aborts with an error when this happens.carve-zip
and deal with them individually; I added support for this in https://github.com/binref/refinery/commit/a8dd77367511f6cb6114aaf78151c26342b33cd7. For convenience, running xtzip
with --lenient
could then synthesize multiple results from carve-zip
, that seems explicit enough. But we would not have to add a synthesized EOCD, it would be much easier and require less code to simply run carve-zip
and then operate on each of its outputs individually. This could even be implemented in the archive unit interface to allow the same behaviour for other extractors that have corresponding carvers. I'll play around with this idea and update this PR if it leads somewhere.
"The xtzip unit checks against ambiguous input (i.e. multiple EOCDs) and aborts with an error when this happens."
-> That's right the user should recognize if the zip contains multiple EOCDs.
"The user can carve the ZIP files using carve-zip and deal with them individually;"
-> I think this depends on the design goals for refinery. It is possible that you can user other units to get the same result but this is more difficult for new users. Having a switch within the xtzip unit to handle these kinds of zip files would be more convenient.
I understand. I have made a few changes and this is how it would work right now:
$ emit e90b970c5e5ddf821d6f9f4d7d710d6dc01d59b517e8fb39da726803dc52b5ad | xtzip -l
(09:11:33) failure in xtzip: exception of type MultipleArchives; The input contains 2 archives. Use
the carve-zip unit to extract them individually or set the --lenient/-L option to fuse the archives.
$ emit e90b970c5e5ddf821d6f9f4d7d710d6dc01d59b517e8fb39da726803dc52b5ad | xtzip -lL
archive1/SHIPPING_MX00034900_PL_INV_pdf.exe
archive2/order.jpg
Does that work for you?
After a small correction, it's now even slightly more intuitive:
$ emit e90b970c5e5ddf821d6f9f4d7d710d6dc01d59b517e8fb39da726803dc52b5ad | xtzip -lL
archive1/order.jpg
archive2/SHIPPING_MX00034900_PL_INV_pdf.exe
Yea I think this is a nice solution
Nice. I will close out this PR and release a new version soon so you can update to get support for this.
This is now available in binary-refinery 0.6.26.
Double loaded ZIP is a technique to hide malware from gateway scanners. The idea is to create two zip files, the first with a lure document and the second with the malware. Then you append the byte stream of the second zip to the first one.
Now it depends how the gateway scanner parse the file, if they start with the beginning of the file they just find the lure document. If they start at the end they find the malware.
The xtzip unit use the python lib zipfile and they start at the end of a zip file. This is the reason why refinery just extract the second archive and not both.
My fix creates a new central directory and append it to the end of the byte stream, so zipfile can then parse a double loaded zip like a normal zip file.