Open sr1990 opened 4 months ago
@sr1990
Just checking. You are showing your offsets as 4-byte integers. Is that what you're using? The offsets should be 8-bytes:
offset is defined as the absolute file offset of the box as an 8-byte integer in big-endian format
Also, the free
box (and offset) IS included in the hash. Otherwise, your offsets look correct to me. Let me know if this isn't enough information and I'll take a deeper dive.
Hi @dhentschel-truepic, thanks for checking.
"00 00 00 00 00 00 76 08 || free box || 00 00 00 00 00 00 82 80 || mdat box || 00 00 00 00 00 EB C5 4A || moov" gives me (shasum -a 256 free_mdat_offset_moov.dat | cut -f1 -d\ | xxd -r -p | base64) Yw+t3BaDGhphAE/8Uqwb0XN3D3QoZrkdKxzzwaD7mR4= which is still not the same as hash mentioned in the assertion. Am i missing anything above?
Also, do you have an example fmp4 file as per https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_auxiliary_c2pa_boxes_for_large_and_fragmented_files where
For fMP4 assets which are stored as a single flat MP4 file with a single 'moov' for all tracks and then one 'moof'/'mdat' pair for each fragment:
One auxiliary 'uuid' C2PA box with box_purpose set to 'merkle' as described below shall be included immediately preceding each 'moof' box.
Okay... I think I can explain this, though I don't completely know why. The moov
box, when extracted via mp4extract
, doesn't match the content of the actual file. Here's the difference that I'm seeing:
$ diff <(od -x hash_15.bin) <(od -x ~/moov.bin)
35,37c35,37
< 0001040 0000 4800 0000 0000 0000 0100 2000 2020
< 0001060 2020 2020 2020 2020 2020 2020 2020 2020
< 0001100 2020 2020 2020 2020 2020 2020 1800 ffff
---
> 0001040 0000 4800 0000 0000 0000 0100 0000 0000
> 0001060 0000 0000 0000 0000 0000 0000 0000 0000
> 0001100 0000 0000 0000 0000 0000 0000 1800 ffff
A "better" way to extract the moov
box is:
$ dd if=truepic-20230212-zoetrope.mp4 of=moov.bin bs=1 skip=$((0xEBC54A)) count=5357
If I use this command, then I get the correct hash:
$ cat <(echo 0000000000007608 | xxd -r -p) <(dd if=truepic-20230212-zoetrope.mp4 of=/dev/stdout bs=1 skip=$((0x7608)) count=3192 2>/dev/null) <(echo 0000000000008280 | xxd -r -p) <(dd if=truepic-20230212-zoetrope.mp4 of=/dev/stdout bs=1 skip=$((0x8280)) count=15418058 2>/dev/null) <(echo 0000000000EBC54A | xxd -r -p) <(dd if=truepic-20230212-zoetrope.mp4 of=/dev/stdout bs=1 skip=$((0xEBC54A)) count=5357 2>/dev/null) | shasum -a 256 | cut -d' ' -f1 | xxd -r -p | base64
nEzS9vlbVhdhYr8FO8gtNdLvKPaPz0iAaDj4y6Q5pV0=
It takes a little bit of time to run, but it gives the proper output.
Oh, and to answer your second question, I don't have any example files with merkle
boxes. Sorry.
That's interesting. It looks like one of your extraction tools is turning null values (x00) into spaces (x20).
Bertram Lyons
CEO and Co-Founder
Medex Forensics
Phone +1 917-522-4852 Mobile +1 202-430-4457
612 West Main Street, Suite 200, #212, Madison, WI 53711 USA
Book a meeting with me: https://meetings.hubspot.com/bertram-lyons
https://d2GrgM04.na1.hs-sales-engage.com/Ctc/UC+23284/d2GrgM04/Jl22-6qcW7lCdLW6lZ3lGW3X0Gyt4l6Pk8W53SLwl6KZQtrW7S3PT_4WLlRmW6CZlfM78K6sRW8XcZx04pJL2kW8gmj153jvDCjW6Hwymm1gcqmXW7-82zv2zmB_vW60w0ST3zwXvrW2G6wwZ65gB02W1pVsLF63K4BCW7nkxcM1P7FgSW527vK17bCsNvW8V26S67glfp_W2vq30h8_C3G2W9hkXlD8Dx5lMN1BsdPMJS9kMW1Mhlxn7zHL-TW8KLhPw4nJ9h4W8hjPZ54WJqRVW1fNlFJ67KgFDW3dKC_q28v-JbW9jKr7S3P5jx6W4w5fR28C8682f6QFk_g04 [image: Twitter image of Twitter icon] https://d2GrgM04.na1.hs-sales-engage.com/Ctc/UC+23284/d2GrgM04/JkM2-6qcW6N1vHY6lZ3p_W52xyRZ1LYR7SW60f-p37tCtlWW4vHCwL3Zp8D9W1GkxZL12Vd_RW2fVG0h8-5sWQW10BPlC5LMFWgW690Cn1898p2MW77LZsq87DbCXW2VDNQr892nM_W6pgNGw5-6KgKW8xTK1y9gqhBNW2TM07G9hhQL5N2xgfT7Cd3GBW8KvW7w3bPjlxN7qmg2hNHBpYN28wG9XhWgFDW7dsR359gN_1tW8P9vCp8P_YfDW4jj6Yj7J5MKLW7slWNx9blv7KW6L5blW4vfK4CN7LxNm9g7z9Lf4TvRl-04
Medex Forensics develops and deploys digital forensic technology to identify child predators, fight digital crime, and combat disinformation.
On Thu, Feb 22, 2024 at 9:28 AM dhentschel-truepic @.***> wrote:
Okay... I think I can explain this, though I don't completely know why. The moov box, when extracted via mp4extract, doesn't match the content of the actual file. Here's the difference that I'm seeing:
$ diff <(od -x hash_15.bin) <(od -x ~/moov.bin) 35,37c35,37 < 0001040 0000 4800 0000 0000 0000 0100 2000 2020 < 0001060 2020 2020 2020 2020 2020 2020 2020 2020 < 0001100 2020 2020 2020 2020 2020 2020 1800 ffff
0001040 0000 4800 0000 0000 0000 0100 0000 0000 0001060 0000 0000 0000 0000 0000 0000 0000 0000 0001100 0000 0000 0000 0000 0000 0000 1800 ffff
A "better" way to extract the moov box is:
$ dd if=truepic-20230212-zoetrope.mp4 of=moov.bin bs=1 skip=$((0xEBC54A)) count=5357
If I use this command, then I get the correct hash:
$ cat <(echo 0000000000007608 | xxd -r -p) <(dd if=truepic-20230212-zoetrope.mp4 of=/dev/stdout bs=1 skip=$((0x7608)) count=3192 2>/dev/null) <(echo 0000000000008280 | xxd -r -p) <(dd if=truepic-20230212-zoetrope.mp4 of=/dev/stdout bs=1 skip=$((0x8280)) count=15418058 2>/dev/null) <(echo 0000000000EBC54A | xxd -r -p) <(dd if=truepic-20230212-zoetrope.mp4 of=/dev/stdout bs=1 skip=$((0xEBC54A)) count=5357 2>/dev/null) | shasum -a 256 | cut -d' ' -f1 | xxd -r -p | base64 nEzS9vlbVhdhYr8FO8gtNdLvKPaPz0iAaDj4y6Q5pV0=
It takes a little bit of time to run, but it gives the proper output.
— Reply to this email directly, view it on GitHub https://github.com/c2pa-org/public-testfiles/issues/7#issuecomment-1959689684, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHOFA76WGQFNPHIVGSBQVTYU5P3HAVCNFSM6AAAAABDJJI6QGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJZGY4DSNRYGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@dhentschel-truepic, thanks for the explanation. Looks like the mp4 file contains the following avc1 sample entry box:
[stsd] size=12+152
entry_count = 1
[avc1] size=8+140
data_reference_index = 1
width = 1920
height = 1080
compressor =
[avcC] size=8+35
Configuration Version = 1
Profile = High
Profile Compatibility = 0
Level = 40
NALU Length Size = 4
Sequence Parameter = [67 64 00 28 ac b4 03 c0 11 3f 2c ac 14 18 14 1b 42 84 d4]
Picture Parameter = [68 ee 06 f2 c0]
where compressor name string above is at
offset: EB C7 74
value: 00 20 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 2020
first byte : 00 - specifies the length of the compressor name string. remaining bytes: value of the string.
mp4extract while a. parsing the box at AP4_VisualSampleEntry::ReadFields(AP4_ByteStream& stream) will parse the compressor name length (byte 1) which is 0 in this case, add null (0x00) at first index to mark end of the string, and create data member m_CompressorName. b. writing the box at AP4_VisualSampleEntry::ReadFields(AP4_ByteStream& stream) will write m_CompressorName followed by null bytes (till 32 bytes are reached) and that is why 0x20 s are replaced by 0x00 s in the output.
Are 31 bytes spaces (0x20) mentioned as compressor name in sample entry box correct?
hey @bertramlyons, its the other way around, mp4extract is changing 0x20 to 0x00.
@sr1990, I can't explain that to you. The video was recorded on a Google Pixel 5, and our software just signed the output of the camera subsystem. We didn't do any processing of the file other than to add the C2PA manifest store.
Can someone explain how the hash assertion for the video file "video/mp4/truepic-20230212-zoetrope.mp4" is generated?
truepic-20230212-zoetrope.mp4 has the following boxes ftyp uuid free: offset 00 00 76 08 mdat : offset 00 00 82 80 moov: offset 00 EB C5 4A
Hash assertion contains hash "hash": "nEzS9vlbVhdhYr8FO8gtNdLvKPaPz0iAaDj4y6Q5pV0="
If I extract the boxes from mp4 mp4extract moov truepic-20230212-zoetrope.mp4 moov.dat mp4extract mdat truepic-20230212-zoetrope.mp4 mdat.dat
and append the offsets at the beginning of moov and mdat. 00 00 82 80 || mdat.dat || 00 EB C5 4A || moov.dat, the generated hash (shasum -a 256 mdat_moov.dat) gives bfddcea827141be1de70330fc642bc88d37d8f8d30d73986c1c2e8c0eccc657a = b64 "v93OqCcUG+HecDMPxkK8iNN9j40w1zmGwcLowOzMZXo=" which is not the same as hash mentioned in hash assertion.
Also, note at https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html#_bmff_based_hash mentions:
Based on above note, should "free" be added to the exclusion list? I do not see it in the hash assertion exclusion list at https://github.com/c2pa-org/public-testfiles/blob/aa84e25a756e0f9b90682b19e3b519519209dd85/video/mp4/manifests/truepic-20230212-zoetrope/manifest_store.json#L136
If free is taken into consideration, hash of "00 00 76 08 || free || 00 00 82 80 || mdat || 00 EB C5 4A || moov" gives hash (shasum -a 256 free_mdat_moov.dat): 0a81bb6473f7d4069e511661bfe38ca4021b780c0fe6d3c7b2366a06070ffefa = b64 "CoG7ZHP31AaeURZhv+OMpAIbeAwP5tPHsjZqBgcP/vo=" which is not the same as hash mentioned in hash assertion.
What am I missing here?