elfmz / far2l

Linux port of FAR v2
GNU General Public License v2.0
1.78k stars 173 forks source link

Archive: Non-latin filenames not supported #1199

Open izzylaif opened 2 years ago

izzylaif commented 2 years ago

Steps to reproduce: 1) In Windows using WinRar create a .zip archive on an NTFS disk containing at least one file with Cyrillic symbols in the name, i.e. тестовыйfile.pdf 2) Open the archive in Far2l by "entering" it. Open a folder on the other panel. 3) Select the Cyrillic file and try extracting it 4) Far2l will flash without extracting anything. Archives created in Linux work OK, regardless of contents. This is extremely dangerous with archives contain files with both latin and non-latin names, as you loose files. And such archives primarily come from Windows machines.

I attach a sample archive for you to play around. тест.zip

unxed commented 2 years ago

You should have locale set to ru_RU for this to work correctly as far2l does not knows what ANSI code page is used on system there archive was created and defaults to appropriate for current locale.

(but this archive has also utf-8 file names version, maybe a bug — why they aren't used instead of ANSI ones)

elfmz commented 2 years ago

Looks like libarchive bug. In ubuntu 18.04 with libarchive 3.2.2 its fails to extract, in Ubuntu 20 with libarchive 3.4.0 everything fine. @izzylaif please check your version with
grep '#define.*ARCHIVE_VERSION_NUMBER' /usr/include/archive.h

unxed commented 2 years ago

mine (locale ru_RU, extracts ok)

$  grep '#define.*ARCHIVE_VERSION_NUMBER' /usr/include/archive.h
#define ARCHIVE_VERSION_NUMBER 3004000
izzylaif commented 2 years ago

Ubuntu 20 with libarchive 3.4.0

_lsbrelease -a Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS

Looks like libarchive bug. In ubuntu 18.04 with libarchive 3.2.2 its fails to extract, in Ubuntu 20 with libarchive 3.4.0 everything fine. @izzylaif please check your version with grep '#define.*ARCHIVE_VERSION_NUMBER' /usr/include/archive.h

grep: /usr/include/archive.h: No such file or directory

:)

mine (locale ru_RU, extracts ok)

this really, really should work locale-independent for a myriad of reasons.

izzylaif commented 2 years ago

The output when F5ing a Russian-named file from that archive "/usr/lib/far2l/far2l" --libexec "/usr/lib/far2l/Plugins/multiarc/plug/multiarc.far-plug-mb" BuiltinMain libarch x /home/user/Downloads/тест.zip -cs=CP437 -- "файл1.txt" nothing unpacks though

unxed commented 2 years ago

What if you do sudo apt install libarchive-dev, rebuild far2l (if you build it from sources), and retry extraction after it?

unxed commented 2 years ago

Another thing to try is to create ~/.config/far2l/cp file with two lines:

866
1251

and restart far2l

izzylaif commented 2 years ago

Another thing to try is to create ~/.config/far2l/cp file with two lines:

866
1251

and restart far2l

This alone worked. The archive is unpacking now. What have I done exactly? Will it cause problems with UTF8 not being used somewhere?

unxed commented 2 years ago

You specified two code pages, OEM and ANSI, used by far2l by default. For example, OEM code page is used for filenames in zip archives created on Windows if there are no UTF8 header field in such archives.

Your sample archive, btw, do have UTF8 header fields, and I do not know why OEM header fields are used instead.

This setting should not cause any problems. far2l create archives with UTF8 headers anyway. And it tries to use UTF8 headers then unpacking; this setting is used only if it fails to.

neopaf commented 2 years ago

some cyrillic letters work OK now.

FAR2L, version 2.3.210921-716b3290-alpha (build 21/09/21-716b3290-alpha) Darwin x86_64

Yet letter "yo" (ё) fails to get processed properly:

Takeout/Диск$ "/Applications/far2l.app/Contents/MacOS/far2l" --libexec "/Applications/far2l.app/Contents/MacOS/Plugins/multiarc/plug/multiarc.far-plug-mb" Built inMain libarch x /Users/paf/2backup/takeout-20220324T075508Z-001.zip -- "Takeout/Диск/3-НДФЛ расчёты.xlsx"

file is not getting unpacked. Silently does nothing.