elfmz / far2l

Linux port of FAR v2
GNU General Public License v2.0
1.76k stars 171 forks source link

unpacking some zip archives fails #121

Closed unxed closed 7 years ago

unxed commented 7 years ago

list_ok_unpack_fails.zip test.zip found the source of problem. see:

/home/unxed$ unzip -o  /home/unxed/Downloads/list_ok_unpack_fails.zip "проверка/*.*" -d . 
TIP: If you feel stuck - use Ctrl+Alt+C to terminate everything in this shell.            
Archive:  /home/unxed/Downloads/list_ok_unpack_fails.zip                                  
caution: filename not matched:  проверка/*.*                                              
/home/unxed$ unzip -o  /home/unxed/Downloads/list_ok_unpack_fails.zip "проверка/*" -d .   
Archive:  /home/unxed/Downloads/list_ok_unpack_fails.zip                                  

*.* does not mean "any file" on linux, so unzip can not find anything matching *.* in empty folder and skips extracting it.

*.* should be replaced by * on linux I guess

unxed commented 7 years ago

Changing "all files" mask in zip.cpp to "*" fixes this for test.zip, but the problem stil persists with list_ok_unpack_fails.zip

looks like the bug in unzip

unxed commented 7 years ago

btw, I'm using unzip from ppa:frol/zip-i18n it goes well with 866 zip files, but shows this bug with utf8 ones

contacted the author with the need to solve this issue. proposed platform-detection solution like I did for multiarc.

unzip bundled with ubuntu/mint does not show this error, but bundled 7z performs the opposite way, breaking gui tools using it with OEM zip archives.

btw, far2l becomes the most intellegent zip-handling file manager for linux)

upd: the method of encoding detection I used fails on some utf8 archves created on windows. example: 23-10-2012-b-fasi-eaep.zip

maybe utf-8 encoded file name extra field from file header may be used to detect such cases, but multiarc does not currently support it. https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375)

unxed commented 7 years ago

related info https://github.com/mate-desktop/engrampa/issues/5 http://linuxmd.net/forum/komandnaya-stroka-terminal/473-unzip-6-10c

elfmz commented 7 years ago

list_ok_unpackfails.zip works fine (after ._ -> * change) with following unzip:

user@ubuntu:~$ unzip UnZip 6.00 of 20 April 2009, by Debian. Original by Info-ZIP.

Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir] Default action is to extract files in list, except those in xlist, to exdir; file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).

I remembered that had some problems with some unzip (some files were not extracted - stupidly bypassed in list, even with english letters) then I installed some another zip package (that also has unzip utility :) ) and it worked much better

unxed commented 7 years ago

as I have discovered during some investigations, zip has worst i18n-support popular archive format ever had. mash of non-compatible implementations, mostly assuming that there can only be one locale, only one code page, only one encoding of file names and only one field for storing file names inside archive. and most of implementations totally ignore each others presence. but if you want even more fun, there are incompatibility between versions also.

giving it up for now, feeling a little bit tired from all that stuff)

fortunately far can now successfully manage zips that I pesonally have using tools that I personnaly have)

PS: had a look inside p7zip's code. guess that I saw there? they just IGNORE code page parameter in MultiByteToUnicodeString calls. hard to guess how this all manage to work in windows builds (upd: seems that was [partially?] fixed in 16.03, but it has no linux port for now to test).

PPS: got the same result with same unzip version. closing this as the remining part is not a far2l's problem.