anthwlock / untrunc

Restore a truncated mp4/mov. Improved version of ponchio/untrunc
GNU General Public License v2.0
1.87k stars 182 forks source link

Unable to find correct codec for GoPro footage #108

Open gavia opened 2 years ago

gavia commented 2 years ago

Hi, I'm trying to recover about 20 videos from my GoPro Hero 9 which unfortunately all become corrupted when the GoPro suddenly powered off during recording. I have been able to successfully restore most of each video by using -s, however there are numerous points inside of each video where the correct codec could not be found. Below is the output when using ffpmeg 3.3.9. When using -s, there are many many instances of Warning: Codec::was_bad_ = 1

I initially tried this on the latest ffmpeg, where there were far fewer instances of the above warning, but they were being skipped.

Are you able to help at all? I'll send you the videos - I have chosen a "small" corrupted video (about 400mb) but unfortunately the smallest good video that i have is about 2.8gb, which i will include.

Output:

  Metadata:
    major_brand     : mp41
    minor_version   : 538120216
    compatible_brands: mp41
    creation_time   : 2021-04-17T15:04:14.000000Z
    firmware        : HD9.01.01.50.00
  Duration: 00:08:13.84, start: 0.000000, bitrate: 45296 kb/s
    Stream #0:0(eng): Video: hevc (Main) (hvc1 / 0x31637668), yuvj420p(pc, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 45036 kb/s, 25 fps, 25 tbr, 90k tbn, 25 tbc (default)
    Metadata:
      creation_time   : 2021-04-17T15:04:14.000000Z
      handler_name    : GoPro H.265
      encoder         : GoPro H.265 encoder
      timecode        : 16:03:13:00
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 189 kb/s (default)
    Metadata:
      creation_time   : 2021-04-17T15:04:14.000000Z
      handler_name    : GoPro AAC
      timecode        : 16:03:13:00
    Stream #0:2(eng): Data: none (tmcd / 0x64636D74) (default)
    Metadata:
      creation_time   : 2021-04-17T15:04:14.000000Z
      handler_name    : GoPro TCD
      timecode        : 16:03:13:00
    Stream #0:3(eng): Data: none (gpmd / 0x646D7067), 51 kb/s (default)
    Metadata:
      creation_time   : 2021-04-17T15:04:14.000000Z
      handler_name    : GoPro MET
    Stream #0:4(eng): Data: none (fdsc / 0x63736466), 9 kb/s (default)
    Metadata:
      creation_time   : 2021-04-17T15:04:14.000000Z
      handler_name    : GoPro SOS
[aac @ 0xaaaaba7acd40] Multiple frames in a packet.
[aac @ 0xaaaaba7acd40] Sample rate index in program config element does not match the sample rate index configured by the container.
[aac @ 0xaaaaba7acd40] Inconsistent channel configuration.
[aac @ 0xaaaaba7acd40] get_buffer() failed
Info: version 'v306-3d11955-dirty' using ffmpeg '3d11955'
Info: reading /files/GX060117.MP4
Info: parsing healthy moov atom ...
ftyp_ = 'mp41'
assuming constant duration of 3600 for 'hvc1' (x12346)
ss: hvc1 is_stable: 0
ss: using f=7 span
assuming constant duration of 1024 for 'mp4a' (x23147)
ss: mp4a is_stable: 0
ss: using f=7 span
ss: tmcd is_stable: 0
ss: using f=7 span
Info: special track found (tmcd, 'GoPro TCD')
ss: gpmd is_stable: 0
ss: using f=7 span
Info: special track found (meta, 'GoPro MET')
assuming constant duration of 0 for 'fdsc' (x35970)
ss: fdsc is_stable: 0
ss: using f=7 span
Info: special track found (meta, 'GoPro SOS')

ss: reset to default (from 3442991 ~= 0.41*default)
ss: max_part_size_: 8388608
fallback: -1
calling findMdat on truncated file..
Info: reading mdat from truncated file ...

(reading element from mdat)
Offset: 0 / 28 : 4750524f 84020000
Track codec: gpmd
Track codec: fdsc
part-length: 644
1th sample in 1th fdsc-chunk

(reading element from mdat)
Offset: 644 / 672 : 47500f01 0006131c
Track codec: gpmd
Track codec: fdsc
part-length: 220
2th sample in 1th fdsc-chunk

(reading element from mdat)
Offset: 864 / 892 : 00061318 2601ade0
Track codec: gpmd
Track codec: fdsc
Track codec: mp4a
Failure because of NULL header
Track codec: hvc1
---
pos: 864 / 892
Length: 398104+4
Nal type: 19
nuh_layer_id: 0
nuh_temporal_id_plus1: 1
first_slice_segment_in_pic_flag = 1
Partial hvc1-length: 398108
---
pos: 398972 / 399000
First byte expected 0
failed parsing h256 nal-header
part-length: 398108
1th sample in 1th hvc1-chunk

(reading element from mdat)
Offset: 398972 / 399000 : 47500500 00000004
Track codec: gpmd
Track codec: fdsc
part-length: 16
1th sample in 2th fdsc-chunk

(reading element from mdat)
Offset: 398988 / 399016 : 000ce73b 47500400
wouldMatch(398988, "", 1) -> no
using hardcoded 'tmcd' packet (len=4)
1th sample in 1th tmcd-chunk

(reading element from mdat)
Offset: 398992 / 399020 : 47500400 000001f8
Track codec: gpmd
Track codec: fdsc
part-length: 16
1th sample in 3th fdsc-chunk

(reading element from mdat)
Offset: 399008 / 399036 : 20986a96 b21a6804
Track codec: gpmd
Track codec: fdsc
Track codec: mp4a
mp4a: Success because of large s value
nb_samples: 1024
part-length: 504
1th sample in 1th mp4a-chunk

(reading element from mdat)
Offset: 399512 / 399540 : 47500400 000001f8
Track codec: gpmd
Track codec: fdsc
part-length: 16
1th sample in 4th fdsc-chunk

(reading element from mdat)
Offset: 399528 / 399556 : 20970a96 d16a3804
Track codec: gpmd
Track codec: fdsc
Track codec: mp4a
mp4a: Success because of large s value
nb_samples: 1024
part-length: 504
1th sample in 2th mp4a-chunk

(reading element from mdat)
Offset: 400032 / 400060 : 47500003 0002caf9
Track codec: gpmd
Track codec: fdsc
part-length: 16
1th sample in 5th fdsc-chunk

(reading element from mdat)
Offset: 400048 / 400076 : 0002caf5 0201d00c
Track codec: gpmd
Track codec: fdsc
Track codec: mp4a
Failure because of NULL header
Track codec: hvc1
---
pos: 400048 / 400076
Length: 183029+4
Nal type: 1
nuh_layer_id: 0
nuh_temporal_id_plus1: 1
first_slice_segment_in_pic_flag = 1
Partial hvc1-length: 183033
---
pos: 583081 / 583109
First byte expected 0
failed parsing h256 nal-header
part-length: 183033
1th sample in 2th hvc1-chunk

(reading element from mdat)
Offset: 583081 / 583109 : 99d6b5c6 e3aecb75
Track codec: gpmd
Track codec: fdsc
Track codec: mp4a
mp4a: Success because of large s value
nb_samples: 0
got_frame: 0
channels: 2, 0
part-length: -22
Invalid length: part-length is -22
Track codec: hvc1
Track codec: tmcd
Error: unable to find correct codec -> premature end (~0.1396%)
       try '-s' to skip unknown sequences

mdat->file_end: 583109
Info: Found 11 packets ( gpmd: 0 fdsc: 6 mp4a: 2 hvc1: 2 hvc1-keyframes: 1 tmcd: 1 )
Tip: Audio and video seem to have different durations (0.525).
     If audio and video are not in sync, give `-sv` a try. See `--help`
Info: Duration of gpmd:  (0 ms)
Info: Duration of mp4a: 42ms  (42 ms)
Info: Duration of hvc1: 80ms  (80 ms)
Info: pruned empty 'gpmd' track
Info: saving /files/f128250112.mp4_fixed.MP4
anthwlock commented 2 years ago

Interestingly, the corrupted file itself contains a valid moov atom. But it looks like some junk(?) data got injected to certain places, leading to an (from the original moov unaccounted) offset for the following frames.

It is unclear to me whether those junk-sequences always start at the end of another healthy frame, or if they span across frames. However, assuming the second hvc1 frame is contained fully, the first unknown sequence is exactly 131072 (= 2**17) bytes long. This makes me think that these junk-sequences might got inserted in some "controlled" way, however the lengths of following junk-sequences are not that regular (maybe because they span multiple frames).

What I also found interesting is that the mp4a frames often contain sequences of low-entropy data like

"2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D 2D"  // "------------"
"D2 D2 D2 D2 D2 D2 D2 D2 D2 D2 D2 D2"
"5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A 5A"  // "ZZZZZZZZZZZZ"
"A5 A5 A5 A5 A5 A5 A5 A5 A5 A5 A5 A5"
"69 69 69 69 69 69 69 69 69 69 69 69"  // "iiiiiiiiiiii"
"96 96 96 96 96 96 96 96 96 96 96 96"

I am unsure about the meaning of those, but 1. the healthy file does not contain those, 2. maybe it (somehow) indicates that an audio-frame is a duplicate (since the recovered file seems to often have audio sequences (~1-2s) repeated)

Since I am not sure of the meaning of the junk-sequences, I can't say whether perfect recovery is (theoretically) possible or not. If you find other tools which are able to produce better results, please let me know. One could for example try the demo of GPR.

In caf8cbb0ac160cb320d2a3884ef6bc81becd7d2a I restricted the patterns for fdsc and hvc1 tracks after unknown sequences. It should improve the recovery of your files a bit, but repeated audio sequences and some video glitches remain.

gavia commented 2 years ago

Thank you for investigating. I had come across GPR before but because I don't have a windows computer, I haven't tried it. I think it's time for me to setup a VM and give it a go.

Some further information/thoughts: the corrupted video file(s) were all recovered using PhotoRec. What initially happened was that the GoPro suddenly started beeping during recording and then turned off. When I got home and looked at the memory card, it was as if it had just been freshly formatted, however strangely enough the files that had been trashed but not actually deleted still remained in the trash folder. This led me to believe that an error during recording had essentially wiped the partition information and/or scrambled the data, but had not completed a "fresh" format.

I created an image of the SD card (128gb) using TestDisk and started trying to recover the files using a variety of software. PhotoRec was the one that gave me the best results. The low resolution JPG files were recovered perfectly fine, and all of the high and low resolution videos were recovered but none of them would play except for the one that I sent you which included the moov atom.

On the image itself I did some deeper analysis in a hex editor and was able to find all (I think) of the moov atoms in tact, but none of them appeared directly after the mdat atoms. Further, there was a LOT of garbage or other data in between the end of the mdat atom and beginning of moov atoms, and I wasn't sure which mdat atom matched which moov atom. There was also a lot of structured data directly before every single moov atom so I wasn't sure if I could simply take the beginning of a moov atom and piece it to the end of an mdat atom.

This is how I came to untrunc, as I figured it might be easier to recreate the moov atoms for the entire mdat atoms rather than match them up as described above.

Next steps I'll try GPR, and perhaps come back to matching the moov atoms with the mdat atoms manually, but as a last resort as that will require much more work, unless you know of software that can try to do that automatically from an img file?

Let me know if you'd like to see any of the other corrupted videos, either low quality (which are closer to 250mb) or high quality (most of which are 4gb) - there be more hints in those as to what happened I guess?

gavia commented 2 years ago

By the way, do you know of any software that can look at a moov atom and tell me the length/size/other information that it holds (does this question even make sense?) It might help me in piecing it back together later, rather than complete trial and error.

anthwlock commented 2 years ago

I guess the filesystem of the SD card (ntfs?) did not store your files (completely) sequentially. So those sequences of junk-data could be data from other files. In that case it could be worth a try to recover the filesystem itself, since it would know exactly what (logical) file is spanned across what physical blocks. Maybe testdisk can already do this, otherwise there might be specialized software for the specific filesystem.

were recovered but none of them would play except for the one that I sent you which included the moov atom

You mean the broken 400mb one, right? Do I understand it correctly that PhotoRec put the moov atom at a sensible place, or at least edited the mdat's length? Because I would not expect the moov to align with the mdat by "accident", since the original mdat length was referencing offsets in the logical file, not on the physical disk.

There was also a lot of structured data directly before every single moov atom

Perhaps this is from the "gpmd" track, which (I believe) store data like GPS or acceleration. In that case it could be a normal part of the mdat.

so I wasn't sure if I could simply take the beginning of a moov atom and piece it to the end of an mdat atom.

In principal yes, but you might have to adjust the length of the atoms a bit, since the original length assumes a sequential file.

Next steps I'll try GPR

Let me know about the results!

unless you know of software that can try to do that automatically from an img file?

I don't know of any. I would probably write a python script. But I wouldn't be too confident in that the result is any better than the corrupted 400mb, even if the correct mdat and moov are matched up.

Let me know if you'd like to see any of the other corrupted videos, either low quality (which are closer to 250mb) or high quality (most of which are 4gb)

Sure, why not! For example I wonder if the first unknown sequence with length 2**17 also occurs in other files. Maybe there even is a regular pattern in which those junk-sequences get placed into the file. However I wouldn't have too high hopes about that. Feel free to send me both quality levels. Maybe also one file two times, once from PhotoRec and once manually extracted from the disk image.

do you know of any software that can look at a moov atom and tell me the length/size/other information that it holds

I assume you mean the length of the individual frames. In that case, untrunc -d can do this! Also I think untrunc -ia dumps the first n entries of each relevant atom (stsz, stco, ..) I once wrote a simple atom parser in python, maybe it is useful to you. It currently is hard coded to dump the ctts table in the second track.

anthwlock commented 2 years ago

@gavia Any news? Have you tried GPR?

gavia commented 2 years ago

@anthwlock So sorry for the late reply, I've been traveling recently. I tried GPR and it was mostly successful. Although some of the videos were still a bit choppy, most of them came out perfectly. It must be that the GoPro saves the files in a way that is non-standard but GPR can somehow re-create that. I need to do a proper review of all of the footage (there was a lot) when I return from overseas to see the full extent of the recovery/damage but from the few that i checked they were pretty good.

I can still send you some of these files (including recovered ones) if you want to look in more detail but it will need to be in late January when I am back and have access to my footage.