cgsecurity / testdisk

TestDisk & PhotoRec
https://www.cgsecurity.org/
GNU General Public License v2.0
1.55k stars 190 forks source link

PhotoRec refuses to recover edited DSC_NNNN~2.JPG #83

Closed HinTak closed 4 years ago

HinTak commented 4 years ago

Hello, thanks for an amazing piece of software, first.

I think I may have found an interesting bug. I have new disk formatted to ext4, where I moved a directory of photos in. And I stupidly did a rm -rf ... afterwards.

Anyway, since I am sure the part which stored the photos are not fragmented, PhotoRec did a rather good job. I also have a inode/file listing generated from sleuthkit. As you know jpeg files have exif info storing time stamps, I was able to match up the original names too, mostly.

Here is the interesting part: almost all the entries that PhotoRec did not recover, except one or two cases, are files with original name of the form DSC_NNNN~2.jpg i.e. a derived version which I made on the android phone by cropping / color-adjust / rotate or any operations with the android phone's photo editing software.

My first thought was that PhotoRec probably uses the original name somehow, and decided to skip files looking like backup/temps. But then I realize this is obviously wrong for a few reasons: Photorec clearly does not know their original names, or would have named them appropriately! And there is no reason for file-carving to work that out, and I see there is no such code which looks for "~" either.

So I am suspecting that somehow the editing makes their internal structure slightly strange and causes them to be skipped.

I have other similar pairs ( photo as taken and its edited derivative) and I can modify src/file_jpg.c to do some targeted carving too, so I'll probaby just go ahead and do that.

At this point it is almost a fun exercise, as I know the failed files are derivatives of some sort and the originals are recovered. But I still like to fix this as it eliminates areas for which I need to look for parts of fragmented files.

I'll likely do a pull if I work this out.

HinTak commented 4 years ago

I have thought a bit about this - the easiest way of fixing this is likely just to make a small test image with only of those edited pairs and try carving from it.

cgsecurity commented 4 years ago

Can you share some file samples ? Try with txt and tx? disabled. Do you have the same problem ?

HinTak commented 4 years ago

@cgsecurity thanks for the response - I have actually gone ahead and collected about 20 of those into a directory, ran mkisofs on that directory to generate a iso (it is just that I am somewhat familar with mkisofs), then tried carving them out; unfortunately this turns out to be a complete success. So it looks like I must work directly with the original ext4 image. Also I was carving from unallocated space in ext4. I wonder how file system info is used?

HinTak commented 4 years ago

@cgsecurity here are one such pair I just made (one is cropped a bit from the other) - in case it is useful DSC_6168 DSC_6168~2

HinTak commented 4 years ago

@cgsecurity checked my config - I did not have txt/tx? enabled - I was hoping for a faster run as a first taste, so turned off most . I have only had about a dozen enabled: doc, jpg, mkv, mobi, mov, mp3, mpg, pdf, png, ttf, xar, xz,zip (from grep enable photorec.cfg)

cgsecurity commented 4 years ago

As the issue hasn't been reproduced, I close this issue.

HinTak commented 4 years ago

I have only just started looking at the re-run with the latest code. I see a few "unknown" markers in the log, but the goal is still to find one of the 10 missing jpeg files on the disk and see why Photorec did not carve them...