cgsecurity / testdisk

TestDisk & PhotoRec
https://www.cgsecurity.org/
GNU General Public License v2.0
1.68k stars 205 forks source link

.bz2/.xz file/s cannot be found at all. #76

Closed tansy closed 4 years ago

tansy commented 4 years ago

I just tried to carve .bz2 files from raw source and it turned out it doesn't work. After few frustrating tries I checked source file_bz2.c and it turned out that although ... char bz2_header[3]= {'B','Z','h'}; is correct, the header_check_bz2(...) function is wrong as it checks for "BZh['0'-'\xff']1AY&SY" (in conceptual rexex form) where in fact it should be "BZh['1'-'9']1AY&SY". Value of buffer[3] can be between '1' and '9' and vast majority of files have '9' as it stands for block size (man bzip2) Although it theoretically should work it doesn't and cannot find any .bz2 whatsoever. Even when I turned it off in [File Opt] and prepared .photorec.sig file with my bzip2 option with static signature of "BZh9". Still didn't work. I tried to change it in binary (photorec_static) as I don't have all the libraries required for compilation but it didn't work either as buffer[3] is not single static value so changing binary this way was just hazard.

-- So formally in function header_check_bz2(...): instead of buffer[3]>='0' should be (buffer[3]>='1' && buffer[3]<='9'). Not sure if it will solve the problem because as I said, theoretically it should work but practically it doesn't.

PS. I also noticed that in file_xz.c: xz_header[7] = { 0xfd, '7' , 'z' , 'X' , 'Z' , 0x00, 0x00 }; is 7 bytes long when should be 6. 7'th byte can vary as it's not signature anymore. According to their document xz-file-format.txt:

const uint8_t HEADER_MAGIC[6]
                    = { 0xFD, '7', 'z', 'X', 'Z', 0x00 };

And again, even if .xz file happens to have these 7 bytes like in xz_header[7] photorec still doesn't find them. Pretty much same issue.

cgsecurity commented 4 years ago

The bzip2 header must be block aligned for PhotoRec to detect it.

Concerning the xz header, "2.1.1.2. Stream Flags" says "The first byte of Stream Flags is always a null byte. "

tansy commented 4 years ago

I had lot of stuff on my head recently but can you explain what you mean by "block aligned"? Can it be done somehow?

cgsecurity commented 4 years ago

"Most file systems are based on a block device, which is a level of abstraction for the hardware responsible for storing and retrieving specified blocks of data, though the block size in file systems may be a multiple of the physical block size." The block size is often 512 or 4096 bytes. So the beginning of a file is always beginning on a block boundary.

If you don't want to recover a file but detect files inside files (non-aligned data), you can try to force the blocksize to a single byte photorec /log /d recup_dir /cmd raw_data blocksize,1,search