HaxeFoundation / haxe

Haxe - The Cross-Platform Toolkit
https://haxe.org
6.03k stars 648 forks source link

[zip] Bugfix on zipfiles not having data descriptor after filedata. #11686

Open sebbernery opened 3 weeks ago

sebbernery commented 3 weeks ago

Hello ! I had an issue with zipfiles generated with the tool 7zip on windows. Some files once extracted with haxe.zip.ZipReader was corrupted.

In the zip file format (here or here) some metadata (CRC32 and filesize) can be in the header of the file (in the local file header) or in the footer of the file (in the data descriptor after the compressed data). If the third bit of the local file header flags is 1, there is a data descriptor, not if the flag is 0. Haxe ZipReader don't decompress files if the third bit of flag is at 0 so the result data (supposed to be decompressed) is the compressed data. This patch fix this behaviour.

I attach two files if you want to test this issue. The first one works and the second is broken in the current version of Haxe. test_zip_roller.zip test_zip_7zip.zip

Here is a snippet to test the issue:

class Main {
    static public function main() {
        trace("Working archive");
        var zipread = sys.io.File.read("./test_zip_roller.zip", true);
        var zipfile_entries = haxe.zip.Reader.readZip(zipread);
        for (entry in zipfile_entries) {
            trace(entry.fileName, entry.fileSize, entry.crc32);
            trace(entry.data.toString());
        }
        trace("Broken archive");
        var zipread = sys.io.File.read("./test_zip_7zip.zip", true);
        var zipfile_entries = haxe.zip.Reader.readZip(zipread);
        for (entry in zipfile_entries) {
            trace(entry.fileName, entry.fileSize, entry.crc32);
            trace(entry.data.toString());
        }
    }
}

Output without patch:

$ haxe -main Main.hx -python export/out.py && python3 export/out.py
Main.hx:7: Working archive                                                                                                                                                                                          
Main.hx:11: test/, 0                                                                                                                                                                                                
Main.hx:12:                                                                                                                                                                                                         
Main.hx:11: test/moretext2.txt, 68                                                                                                                                                                                  
Main.hx:12: MORETEXTMORETEXTMORETEXTMORETEXT                                                                                                                                                                        
MORETEXTMORETEXTMORETEXTMORETEXT                                                                                                                                                                                    

Main.hx:11: test/salut.txt, 4                                                                                                                                                                                       
Main.hx:12: test                                                                                                                                                                                                    
Main.hx:11: moretext.txt, 30                                                                                                                                                                                        
Main.hx:12: SECONDTEXTSECONDTEXTSECONDTEXT                                                                                                                                                                          
Main.hx:11: moretext2.txt, 68                                                                                                                                                                                       
Main.hx:12: MORETEXTMORETEXTMORETEXTMORETEXT                                                                                                                                                                        
MORETEXTMORETEXTMORETEXTMORETEXT

Main.hx:11: salut.txt, 4
Main.hx:12: test
Main.hx:14: Broken archive
Main.hx:18: moretext.txt, 30
Main.hx:19: m�!
               �p?� 蟄L�����
Main.hx:18: moretext2.txt, 68
Main.hx:19: �ʱ �]�"H����NY2{ �ή<
Main.hx:18: salut.txt, 4
Main.hx:19: test
Main.hx:18: test/, 0
Main.hx:19: 
Main.hx:18: test/moretext2.txt, 68
Main.hx:19: �ʱ �]�"H����NY2{ �ή<
Main.hx:18: test/salut.txt, 4
Main.hx:19: test

With the patch

Main.hx:7: Working archive
Main.hx:11: test/, 0, 0
Main.hx:12: 
Main.hx:11: test/moretext2.txt, 68, -2888415
Main.hx:12: MORETEXTMORETEXTMORETEXTMORETEXT
MORETEXTMORETEXTMORETEXTMORETEXT

Main.hx:11: test/salut.txt, 4, -662733300
Main.hx:12: test
Main.hx:11: moretext.txt, 30, 55748892
Main.hx:12: SECONDTEXTSECONDTEXTSECONDTEXT
Main.hx:11: moretext2.txt, 68, -2888415
Main.hx:12: MORETEXTMORETEXTMORETEXTMORETEXT
MORETEXTMORETEXTMORETEXTMORETEXT

Main.hx:11: salut.txt, 4, -662733300
Main.hx:12: test
Main.hx:14: Broken archive
Main.hx:18: moretext.txt, 30, 55748892
Main.hx:19: SECONDTEXTSECONDTEXTSECONDTEXT
Main.hx:18: moretext2.txt, 68, -2888415
Main.hx:19: MORETEXTMORETEXTMORETEXTMORETEXT
MORETEXTMORETEXTMORETEXTMORETEXT

Main.hx:18: salut.txt, 4, -662733300
Main.hx:19: test
Main.hx:18: test/, 0, 0
Main.hx:19: 
Main.hx:18: test/moretext2.txt, 68, -2888415
Main.hx:19: MORETEXTMORETEXTMORETEXTMORETEXT
MORETEXTMORETEXTMORETEXTMORETEXT

Main.hx:18: test/salut.txt, 4, -662733300
Main.hx:19: test

The archive contains a file uncompressed (test/salut.txt).

Have a nice day.

sebbernery commented 3 weeks ago

Sorry, I'm not sure why the CI fails, but while trying to solve the issue I think I misunderstood the API of zip Reader class, I guess I have to call Reader.unzip() when a file entry is compressed. But the decompression is transparent if there is a data descriptor and require a call to unzip() if it's not here. I'll check to correct the CI failure (I guess I should set compressed to false to avoid that haxelib try to unzip an already decompressed data) but is this a behavior you want to keep ? I may miss some context.