Open Burve opened 8 months ago
You can actually use --syncTiming, and for most videos it will sync the audio timing to the video (assuming I remembered to implement it for MPDs). I would like to hear how it was working/the general ideas in your python script though, as it may help to improve the syncTiming flag.
Alternatively, if you want to keep all the videos, that's also a flag (--keepAllVideos). You can also choose to just not download the extra videos with the --dlVideoOnce flag. I definitely recommend checking over the available flags in the documentation: https://github.com/anidl/multi-downloader-nx/blob/master/docs/DOCUMENTATION.md
It is good to know about some commands. I started with GUI, but I need to check CLI also.
Short version of audio sync.
Because audio comes in different languages, I decided that it is hard to do by audio, so I did it all by video. My assumption is - as long as I can match video frames, Audio will be fine.
With that in mind, using ffmpeg, I process the beginning and the end of the video (usually 30, 60 or 120 seconds; longer might be more precise but a little more work). There, for the duration, I extracted keyframes that come with the actual frame number, and it is possible to calculate time from the frame. Then I look for the frameset (2 or 3 frames) that has the same content between both files. For example, if a Japanese video starts with an opening song while an English video starts with the company logo and opening song, then you will eventually find a pair of frames that have the same content from the opening song in both of them. For comparison, I used image hash. After the pair is found in each video, I calculate the difference between them (usually the length of the extra logo, etc.). After this, I repeat the same procedure at the end of the video.
Next step is to compare offset (what was the difference between frame pairs in the beginning and the end). In the ideal case, the offset difference between the start and end is below 0.05s, and it is perfect. Sometimes, it is around 0.8s, and that is still good. In rare cases, it might be > 1s, and that is not good and ideal for the manual check, or sometimes something goes wrong, and a match is not found, or the difference is like the 20s, then nothing will work. For these bad cases, I had an option in the code where I could create a giant JPG image with keyframes and time and check manually with the ability to override the offset.
After all this, just use calculated offset for each "audio" for both audio and subtitles (there is still an issue with subtitles that I have not mentioned, like if you have a Japanese video that comes with English subs and English audio, that also comes with English subs [subs cover signs and other onscreen text, not dialogue])
Technical version I used Claude 3 Opus to convert Python code to TypeScript. Python code also included.
Offset calculation in the provided code works as checking between current video and provided. So, if provided is the same as current, then nothing is calculated.
I am sure I forgot some details about the code and will be happy to answer anything.
Type
Both
Suggestion
I found this project recently because of the DRM drama. Since it works while yt-dlp does not, I have noticed a few issues and fixed them in my custom app. One of the issues that I noticed right away is proper multi-audio support.
I did try only a few shows, and I noticed that sometimes both languages (Eng and Jap, that I tried) downloaded, but only the first language stayed because the lengths of them both are not the same. My assumption (without checking the code) is that if both videos are the same length, then they are merged, but if the length is not the same, then merge silently fails.
In reality, CR videos between languages often have different sizes (except for some of the latest ones), with the reason being that different languages have different logos at the start of the video, and then the rest is fine. Sometimes same is also at the end, but mostly at the beginning.
I got some code (in Python, not typescript) that can combine around 95% of CR videos automagically and 99% with a little manual tweaking (there are always some with extra clips in the middle of the video, and I never figured that out).
If something is of Interest (and I would like to help improve this app to the levels I had my personal app with yt-dlp before DRM), then I am willing to share general logic ideas or even some Python code.
Please let me know if that might be interesting.
P.S. Another big issue I have noticed about problems with the same language double subtitles, but I will create separate ticked about that.