dtrx-py / dtrx

Do The Right Extraction
GNU General Public License v3.0
224 stars 10 forks source link

Non utf-8 characters in filename (e.g. à é è) are lost in extraction #18

Closed vchalmel closed 1 year ago

ChrisJefferson commented 2 years ago

Could you give an example file?

vchalmel commented 2 years ago

Yes, How would you do so ? If you would want to simply create an archive with the same pathnames as mine replacing extensions as txt, here is the log from nautilus / gnome-autoar "autoextract" using libarchive

    (nautilus:16872): DEBUG: 11:43:13.198: autoar_extractor_step_scan_toplevel: 1: pathname = DCE_2020_203_AA/Attestation de paiement \x85 180 jours.pdf utf8 pathname = DCE_2020_203_AA/Attestation de paiement à 180 jours.pdf
** (nautilus:16872): DEBUG: 11:43:13.198: libarchive_read_skip_cb: called
** (nautilus:16872): DEBUG: 11:43:13.198: libarchive_read_seek_cb: called
** (nautilus:16872): DEBUG: 11:43:13.198: libarchive_read_seek_cb: 448965
** (nautilus:16872): DEBUG: 11:43:13.198: libarchive_read_read_cb: called
** (nautilus:16872): DEBUG: 11:43:13.199: libarchive_read_read_cb: 65536
** (nautilus:16872): DEBUG: 11:43:13.199: autoar_extractor_step_scan_toplevel: 2: pathname = DCE_2020_203_AA/BPU - Annexe financi\x8are.xlsx utf8 pathname = DCE_2020_203_AA/BPU - Annexe financière.xlsx
** (nautilus:16872): DEBUG: 11:43:13.199: autoar_extractor_step_scan_toplevel: 3: pathname = DCE_2020_203_AA/CCAP GMK HLA Neuromyo Viro.pdf 
** (nautilus:16872): DEBUG: 11:43:13.199: libarchive_read_skip_cb: called
** (nautilus:16872): DEBUG: 11:43:13.199: libarchive_read_seek_cb: called
** (nautilus:16872): DEBUG: 11:43:13.199: libarchive_read_seek_cb: 6308787
** (nautilus:16872): DEBUG: 11:43:13.199: libarchive_read_read_cb: called
** (nautilus:16872): DEBUG: 11:43:13.199: libarchive_read_read_cb: 65536
noahp commented 2 years ago

If you could upload an example archive, that would be most helpful! I tested with a simple ZIP archive, and was unable to reproduce the problem:

❯ touch financière.xlsx

❯ zip financière.xlsx.zip financière.xlsx
  adding: financière.xlsx (stored 0%)

❯ dtrx financière.xlsx.zip
dtrx: WARNING: extracting /tmp/yolo/financière.xlsx.zip to financière.xlsx.1

❯ ls -l
total 4
-rw-rw-r-- 1 noah noah   0 Nov 24 15:39 financière.xlsx
-rw-rw-r-- 1 noah noah   0 Nov 24 15:39 financière.xlsx.1
-rw-rw-r-- 1 noah noah 182 Nov 24 15:39 financière.xlsx.zip
noahp commented 2 years ago

Let me know if you have an example archive that reproduces this problem!

noahp commented 1 year ago

Closing for now, I'm unable to reproduce the problem :( feel free to reopen if you have more information!