cgsecurity / testdisk

TestDisk & PhotoRec
https://www.cgsecurity.org/
GNU General Public License v2.0
1.52k stars 189 forks source link

TestDisk: problem recovering unicode filenames? #130

Closed dlbeswick closed 1 year ago

dlbeswick commented 1 year ago

Hello,

I had an exFAT SD card that failed. I was still able to recover most of the files off it with TestDisk, thanks!

However, I had files with unicode filenames and I found that the filenames didn't recover correctly. As far as I can see the filenames weren't corrupted, but the unicode was not decoded correctly.

For example, I had a folder that TestDisk recovered like this:

^B4!

This is the byte sequence "0x02 0x32 0x21".

The correct text is as follows (Thai script):

ขิม

This has a UTF-16 byte sequence "0x0e 0x02 0x0e 0x34 0x0e 0x21"

So you can see that the correct text (in this case) can be recovered by inserting "0x0e" bytes. It looks as if the UTF-16 sequence has been truncated to single characters, here.

This was TestDisk 7.2 running on Linux.

cgsecurity commented 1 year ago

The ready-to-use Linux x86_64 binaries provided on cgsecurity.org are static binaries. Unfortunately the GNU C Library’s iconv implementation uses shared loadable modules to implement the conversions. So iconv support need to be disabled otherwise the binaries will crash if the local glibc version don't match the glibc version used when compiling. To get the full Unicode support, you need to use the testdisk package provided by your distribution (probably testdisk 7.1) or compile testdisk 7.2-WIP by yourself. In the last case, https://www.cgsecurity.org/testdisk.pdf provides some guidelines.

I have added the information in https://www.cgsecurity.org/testdisk_doc/installation.html#official-binaries

dlbeswick commented 1 year ago

I see, no problem. Thanks for the info Christoph, much appreciated.

ในวันที่ ศ. 17 ก.พ. 2023 01:01 น. Christophe GRENIER < @.***> เขียนว่า:

The ready-to-use Linux x86_64 binaries provided on cgsecurity.org are static binaries. Unfortunately the GNU C Library’s iconv implementation uses shared loadable modules to implement the conversions. So iconv support need to be disabled otherwise the binaries will crash if the local glibc version don't match the glibc version used when compiling. To get the full Unicode support, you need to use the testdisk package provided by your distribution (probably testdisk 7.1) or compile testdisk 7.2-WIP by yourself. In the last case, https://www.cgsecurity.org/testdisk.pdf provides some guidelines.

I have added the information in https://www.cgsecurity.org/testdisk_doc/installation.html#official-binaries

— Reply to this email directly, view it on GitHub https://github.com/cgsecurity/testdisk/issues/130#issuecomment-1433134255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG5TKE7EHXLCUVCLUNLRMDWXYXMTANCNFSM6AAAAAAU557ZCM . You are receiving this because you authored the thread.Message ID: @.***>

cgsecurity commented 1 year ago

I have uploaded a new Linux x86_64 7.2-WIP version with an utf-16-le to utf-8 conversion without iconv for exfat. The filenames are now OK in testdisk.log but may be displayed incorrectly on screen (ncursesw library probably need to be updated).