Closed Zinfidel closed 3 years ago
Aside from general testing, there is a line in the original global.bat that I'm not sure about:
".\programs\avs2pipemod" -wav encode.avs | ".\programs\sox" -t wav - -t wav - trim 0.0065 | ".\programs\opusenc" --bitrate 64 --padding 0 - ".\temp\audio.opus"
I can't find any information about this, but apparently when using opusenc, the track needed to be trimmed by 0.0065 seconds off the front of the track? The way it is now, it's using opusenc embedded in ffmpeg, so I'm not sure if this is necessary anymore and have not replicated this behavior.
programs/ffmpeg-config.7z
Let's have those unzipped, in a folder.
When I tried building x264 on my own I also noticed that it was like 20 times smaller, so might make sense to use a lighter build here too.
Some BG info for qaac: http://tasvideos.org/forum/viewtopic.php?p=454950#454950
Thread about currently used delays: http://tasvideos.org/forum/viewtopic.php?t=19507
On building x264 - building ffmpeg here makes it much smaller because I'm stripping it down to almost nothing. If you can strip stuff out of x264 as well it might be worth it indeed. I don't know that I want to invest the energy into figuring out how to build it right now though.
On the audio delays: Interestingly, after running the audio through the encoders inside of ffmpeg, the opus and ogg output are lined up exactly, the aac-he audio is delayed now. For the test data I'm using, it's a delay of 0.161 seconds, or 7048 samples. Working on this.
Another issue: It appears that the opus format does not support any sampling rates besides 48,000hz and is in fact heavily optimized for this. When opusenc.exe encodes a file, it will insert the original sampling rate into the header of the opus/ogg container, which then can be read by some programs (like mkvmerge in our case) to report the original sampling rate. However, opus always encodes and decodes audio at 48khz regardless of what the original frequency is listed as in the header.
ffmpeg does not set this header data though, and I can't find a way to force it to. So, when we encode using libopus inside of ffmpeg, the audio that comes out is shown as 48khz, regardless of the sampling rate of the original audio. It's been doing this for all of our encodes up until now as far as I can tell, it has just been "lying" to us about how it gets decoded. Read me here: https://trac.ffmpeg.org/ticket/5240 . So how big a problem is this do you think? Our modern encodes would now say 48khz sampling rate, unless we use some external tool to change the tags or something.
I've just spent the night learning a lot more about aac then I really would have preferred. The issue with decoder priming (the set amount of samples that show up at the start of an aac stream) gets more difficult to deal with when using ffmpeg's libfdk_aac. Wheras tools like qaac and various others that implement libfdk have options for dealing with the priming samples for a/v sync (as was made use of by the people in that tasvideos thread), ffmpeg does not. What makes it worse is that the amount of priming data changes depending on the sample frequency, so there isn't a one-size-fits all trim value we can apply. For 44.1khz data, there are about ~7000 samples that need to be trimmed, but at 48khz, there are about 200 more. The time difference between this is tiny, about 1 frame, but it's still not exact so that's frustrating because qaac can do it, but ffmpeg seemingly can't. I've tested sampling frequency and a few other parameters and the numbers change each time, which isn't good.
So the question now is - if we really need to use something other than ffmpeg for doing aac encoding, then is it worth trying to use ffmpeg for any audio encoding? I feel it's kind of all or nothing here, and the amount of effort it feels like it's going to take to get automatically correctly-aligned aac audio out of ffmpeg is just too high considering we have something that works already. I've actually got the custom ffmpeg built with pipe support anyway and I'm pretty sure it will happily pipe to whatever audio programs we have, so dropping avs2pipemod will still work. It's a shame because I wanted to consolidate stuff but I don't want to spend a week on this either.
I did think of one solution to the problem, but it's overly complicated and would be slow. Encoding the audio to aac-he once, then using ffprobe to get the duration. Duration could be compared to original pcm stream, and the difference would be the required offset. The stream would have to be re-encoded with a trim. A re-encode could probably be avoided by just muxing into mp4 with a negative offset, but I'm not sure that muxing with negative offsets is something we want to do with a compatibility download in the first place.
Links I found useful: https://superuser.com/questions/1552916/audio-is-not-in-sync-after-re-encoding-with-ffmpeg-of-video-and-changing-audio-f https://github.com/mstorsjo/fdk-aac/issues/24 https://stackoverflow.com/questions/42410479/ffmpeg-wrong-audio-file-after-conversion-in-aac https://hydrogenaud.io/index.php/topic,85135.msg921707.html#msg921707
Open questions I have:
If you can strip stuff out of x264 as well it might be worth it indeed. I don't know that I want to invest the energy into figuring out how to build it right now though.
Yeah, maybe someday. It was rather easy to build with mingw.
However, opus always encodes and decodes audio at 48khz regardless of what the original frequency is listed as in the header.
Does it actually resample it though? All bizhawk output is 44100Hz for example.
I did think of one solution to the problem, but it's overly complicated and would be slow. Encoding the audio to aac-he once, then using ffprobe to get the duration. Duration could be compared to original pcm stream, and the difference would be the required offset.
Is it to get resolve the variable delay length based on samplerate? If so, we could just force some known sameplerate and fix the delay value.
The stream would have to be re-encoded with a trim
Trim using which tool?
A re-encode could probably be avoided by just muxing into mp4 with a negative offset, but I'm not sure that muxing with negative offsets is something we want to do with a compatibility download in the first place.
The mp4 encode is almost useless anyway, so I'm not sure if we're losing anything really. I think even if we use this offset feature and just wait for user complaints, there will not be any, ever. 10 years ago we were still considering video players that people have at home, but these days everyone just watches youtube. Which is an entirely different beast btw: it's probably the only video hosting other than archive that allows to upload infinite amount of video with almost infinite resolution, and hosts them indefinitely. But if the channel gets banned, there's no way to restote all those uploads. And no way to have a backup, because they're very huge.
Encoding audio and video at the same time in ffmpeg might fix this problem, as I've heard on the internet. I haven't tested this since my custom ffmpeg stuff on a VM doesn't have video support and will take a long time to compile it in. Might be worth testing, but that would also then lead to the question of, are we good with combined encoding/muxing.
For test purposes, we could use the official build just to know if it helps first.
Does it actually resample it though? All bizhawk output is 44100Hz for example.
All of my info about this is coming from https://trac.ffmpeg.org/ticket/5240 . According to the conversation in that link, input is resampled to 48khz and encoded, with the original sampling rate then added as metadata to the file. Upon decoding, the application doing the decoding can decide to act on that data and resample the 48khz opus stream to the original value.
In my tests, MediaInfo
will report the sampling frequency for opus files according to the metadata tag, but ffprobe
/ffmpeg
will report the actual frequency. So for my test file (original PCM audio was 44.1khz), MediaInfo
reports the opusenc.exe
-encoded file as 44.1khz, but ffprobe
reports it as 48khz. The same audio encoded using ffmpeg
gets reported as 48khz by both applications. I've put together a program that can change the original sampling rate bytes in an opus file, so if having the original sampling rate is really important to us, we can use that as a post step. I haven't found a way to do this inside of ffmpeg
yet.
Is it to get resolve the variable delay length based on samplerate? If so, we could just force some known sameplerate and fix the delay value.
Moot point now, I found out my testing methodology was flawed. The version of ffmpeg
that Audacity uses is very old and didn't handle parsing the priming samples correctly. I also forgot to account for the fact that a fixed number of samples translates to differing amounts of delay for different sampling frequencies... oops! Anyway, if we just trim 7106 (6 frames @ 1024 each, + 962) samples off the front of our PCM audio, libfdk_aac
-encoded tracks all line up correctly.
Trim using which tool?
libavfilter
, via the atrim
filter in ffmpeg. Command is -af aframe=start_sample=7107
.
We could just force-resample all our audio, or only what goes to opus. Or resample everything to 44100 and what goes to opus to 48000. Feels like the most straightforward approach would be to force 48000 on everything. I don't think there will be any artifacts, but I don't know much about audio.
The AAC lag is fixed using a fixed sample trim of 7106 samples. Opus and Vorbis line up perfectly without modification so that is good. I have a program that can write the Opus metadata if we want - would just be another line in the Modern encodes sections to run it with the original audio sampling rate. It's not included in this PR as it stands though.
I'm marking this as ready since it feels like it's there or at least pretty close. Only other thing is looking into that custom x264 you built if you want to include it.
Why not!
You should create a release after this PR as well, since the repo now weighs in at 153MB due to the binary file replacements over time.
Where does that number come from?
Ah, I didn't realize that downloading a zip from github removed all of the git stuff. I checked the size of the repo on my machine that has all of the git history and my recent branch, which is where the 153MB came from. No release necessary since the zip already removes the git folders!