Open Tomsen1410 opened 1 month ago
Hi @Tomsen1410,
Thank you for reporting this. Can you tell if the videos are indeed corrupted by opening them in FFmpeg or this is just a DALI behavior? DALI operators work in the push mode, processing the whole batch at the time. So when DALi fails to process a given sample in the batch it cannot ask for more to replace the faulty one, so it throws an error. The only solution that comes to my mind is to provide an empty sample or zeroed one (as some operators may not handle empty tensors gracefully).
Could you provide the ffmpeg command I should test on the video?
You can check this thread and see if FFmpeg can decode and save frames to a file.
Ok, I have ran ffmpeg on the corrupted file and it throws the same error:
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x559fa2226100] moov atom not found
[in#0 @ 0x559fa2225fc0] Error opening input: Invalid data found when processing input
Error opening input file /path/to/file.mp4.
Error opening input files: Invalid data found when processing input
I am using ffmpeg 6.1.1 installed from the conda-forge channel.
You can find the corrupted file attached.
https://github.com/NVIDIA/DALI/assets/15103267/f4d3216d-e825-49dc-975f-472d44dff41b
If FFmpeg cannot handle the video correctly I don't think we can do more than that. As you are using webdataset, you can manually edit the index file generated by wds2idx.py
script to skip the mentioned sample. I also noticed that DALI doesn't provide a meaningful error message (ad crashes instead of raising an expectation) when it encounters a faulty file. Can you recheck the DALI nightly build once https://github.com/NVIDIA/DALI/pull/5491 is merged, check the offset to the faulty sample in the webdataset, and adjust the index file?
Yes, that is exactly the Problem. I have no way of catching the error and the entire training process stops.
I will check, once it is merged. What exactly do you mean by adjusting the index file? When the decoder throws proper errors there is no need to alter the index file anymore, no?
@Tomsen1410 - the https://github.com/NVIDIA/DALI/pull/5491 has been merged. Please check the next nightly build to see if that helps.
Version
1.35
Describe the bug.
I am using
fn.experimental.decoders.video
to decode videos stored in a web dataset. However, there exist files in my dataset that are corrupt and/or can't be openend by DALI. However, instead of throwing an error the entire process halts with a segmentation fault error when the decoder sees a corrupt video:Essentially, this issue is similar to #5155, but for the experimental decoder instead of the video reader.
Minimum reproducible example
No response
Relevant log output
No response
Other/Misc.
No response
Check for duplicates