MediaArea / RAWcooked

Encodes RAW audio-visual data into the Matroska container (MKV), using the video codec FFV1 for the image and audio codec FLAC for the sound.
https://mediaarea.net/RAWcooked
BSD 2-Clause "Simplified" License
42 stars 11 forks source link

Some DPX Sequences Result In MKV Files that are Not Valid #431

Open Lawrence58 opened 7 months ago

Lawrence58 commented 7 months ago

When I Rawcooked DPX sequences longer than approximately 15 minutes the result is MKV files that are "Not Valid" according to the Media Conch implementation checker. Other sequences Rawcooked fine. The Media Conch policy lists two failed tests. The full report is attached.

MKV-ELEMENT-VALID-PARENT | Tests run: 142 | Results: [X] Fail count: 1 fail -- 2DCB: [5 bytes] Offset: 20437311 Context: /Segment[1]/Attachments[1]/AttachedFile[1]/FileData[1]/2DCB[1] Format ID: 0x6DCB Reason: FileData is not a valid Parent Element of 2DCB.

TRUNCATED-ELEMENT | Tests run: 1 | Results: [X] Fail count: 1 fail -- Size: 1867661 Offset: 20437313 Context: /Segment[1]/Attachments[1]/AttachedFile[1]/FileData[1]/2DCB[1]/Header[1]

I am able to Rawcooked short sequences fine. When Rawcooking the longer sequences it takes a very long time (24 hours for a 30 minute sequence) and the speed is 0.01x realtime for hours. Before launching Rawcooked I shut down all programs and browsers except Finder (one tab), terminal, and sometimes Activity Monitor. Memory and CPU look fine. Nearly all of the 64 GB of RAM appears to be allocated to Rawcooked. I have tried numerous DPX sequences and configurations, including these, all with the same result.

Any suggestions?

iMac Pro (2017) Mojave v10.14.6 3 GHz Intel Xeon W 64 GB DDR4 RAWcooked 23.09

27295_Graded.mkv_ImplementationReport.txt

JeromeMartinez commented 7 months ago

they are 2 separate issues, but let's try to manage both here.

JeromeMartinez commented 7 months ago
JeromeMartinez commented 7 months ago

About speed, is the FFmpeg compression also slow? They use a different file access API so I am curious to know if it is only the RAWcooked part of global.

Lawrence58 commented 6 months ago

I owe you an apology. I was running an old version of MediaConch. The file is valid with the updated version. Also, I reverse rawcooked and compared the original and rawcooked sequences and they match.

I have not noticed a speed issue with FFMPEG. This screenshot is typical. This was on an iMac Pro with 1TB internal and 64 GB RAM after closing all applications, emptying the trash, etc., to free up as much storage and memory as possible. My sequences are all 2K DPX. I use --check --check-padding --hash --no-accept-gaps. Can you tell me the difference between the default reversibility check and --hash?

RawcookedSpeed
JeromeMartinez commented 6 months ago

The file is valid with the updated version.

Great!

I have not noticed a speed issue with FFMPEG.

Thanks for the info, so this is the file APi we use, I guess, which is not relevant for this use case, and we need to use another one (more RAM pressure in theory but the "mmap" API is worse in practice).

Can you tell me the difference between the default reversibility check and --hash?

--hash reads the DPX file in full during first pass, compute their MD5 hash, and store the hashes in the RAWcooked reversibility attachment, so the reversibility check is done by comparing the hash of the decoded frame with the stored hash, it can be done in the future (without the source files). --check without --hash will compare the decoded frame with the source frame, and can be done only after encoding, not in the future. if you use in the future (without the source files) --check, RAWcooked will test that the decoding is coherent and not a comparaison with any hash (except if it detects a hash file you created yourself in the directory, in that cases it will compare with it).

So if you already have a hash file in the directory --hash is redundant, else it improve the accuracy of the health check in the future.

Lawrence58 commented 6 months ago

Does that mean, if one does not use --hash during encoding and in the future reverts to the DPX sequence from the MKV, there is no way to be sure the new sequence will be bit for bit identical to the original sequence?

JeromeMartinez commented 6 months ago

Does that mean, if one does not use --hash during encoding and in the future reverts to the DPX sequence from the MKV, there is no way to be sure the new sequence will be bit for bit identical to the original sequence?

Right (with the exception of a hash file available in the directory, it is classic to find hash files there, it depends on your policy about it).

That does not mean that the reversibility is so risky, we check that the reversibility file is coherent, but if you want to be 100% sure about the reversibility, one of the only methods for that is to have a hash file, either created by you with e.g. md5sum or created by RAWcooked with --hash.

Lawrence58 commented 6 months ago

I created the side-car framemd5 file with --framemd5 but how do I invoke or specify that it be used when reverse rawcooking the MKV file?

Lawrence58 commented 6 months ago

Never mind. I understand I just create a framemd5 for the new sequence and then do a diff of the two. Doh!

JeromeMartinez commented 6 months ago

with --framemd5

Note that I don't speak about framemd5, which is not a hash of each file but a FFmpeg specific hash of the decoded content based on FFmpeg internal mapping of video data, but md5, which is a hash of each file.

This is the difference between FFmpeg alone and RAWcooked alone: while FFmpeg alone is focused on lossless compression of the content without caring about the format of the files, RAWcooked is focused on lossless compression of the file i.e. content and format of the files, and all metadata.