ennuicastr / libavjs-webcodecs-polyfill

A polyfill for the WebCodecs API. No, really.
82 stars 8 forks source link

audio encoder not giving the same timestamp back in the encoded chunk. #28

Closed RavikumarTulugu closed 3 months ago

RavikumarTulugu commented 4 months ago

The expectation is when we give a raw audio data chunk to the encoder with a timestamp, we should see the same timestamp in the encoded audio chunk.

Yahweasel commented 4 months ago

That is, at best, an overly simplified assumption, since the encoder can break down chunks differently than the user. However, it should in the general case be true, and I'm indeed not confident that that's happening. Do you have an actual example/test for this?

RavikumarTulugu commented 4 months ago

we were debugging our application , in our application we mux separate video and audio streams into a mp4 stream. there is no clear information on internet about whether timestamps need to be preserved or can be modified. the test is simple, i printed the timestamp before and after the encoder , and the timestamps are completely out of sync like a factor of 10. Our application sets custom timestamps in video buffers and audio data for synchronization , it seems the webcodecs standard does not mandate preserving of timestamps. however , one argument is that the timestamp constitutes metadata of the chunk and should not be modified during encoding.

Yahweasel commented 4 months ago

One thing you crucially need to understand is that there is no guarantee that one input chunk becomes one output chunk, so preserving timestamps isn't meaningful. If you give ten seconds of audio data, the first chunk presumably has the same timestamp and the rest can be inferred. But what if you give 5 milliseconds of audio data and the encoder is encoding in 20ms chunks? Preserving timestamps where the timestamps can become discontiguous at any time questions what encoding even means.

However, regardless, I suspect that the timestamps are not being preserved even in the most trivial way right now, as my usual way of using this is to tack on my own timestamps afterwards and disregard whatever timestamp is here. So further investigation is still deserved.

RavikumarTulugu commented 4 months ago

we have a work around for this for now , I kind of agree with your statement, one input chunk need not give one output chunk all the times, but one thing would like to share is that the timestamps in output chunks are differing by an order of 10 from the input time stamp. I tried to look into the code but could not proceed due to lack of time. next time i will paste the time stamps.

Yahweasel commented 4 months ago

After doing some quick testing with real WebCodecs, I've come to the conclusion that its own timestamp handling is bizarre, but potentially something replicatable.

My behavior with video timestamps may already be the same. My behavior with audio timestamps is... similarish.

Chrome WebCodecs pays attention only to the very first timestamp of audio you send it, and offsets all other timestamps by that amount. Every other timestamp is simply based on the duration of audio. It does not care what your timestamps are, just where they start. My behavior with audio timestamps just always starts at 0, so rather than ignoring all but one of the input timestamps, it ignores all of the input timestamps. At least credit me for consistency ;)

The actual spec is fairly quiet on timestamps for all the reasons I addressed above. It's hard to imagine what the "right" timestamp is when the input and output don't correspond in any direct way.

The goal of this project is not to mimic Google Chrome's WebCodecs, but to be a correct implementation using libav.js, so the fact that WebCodecs does it this way on Chrome isn't really a good rationale. If WebCodecs on Chrome was doing something roughly like you suggest (somehow trying to preserve the input timestamps), it would be worth investigating exactly what its behavior is and trying to mimic it, but as its behavior is wrong and my behavior is differenter wrong, I don't find the difference very compelling.

RavikumarTulugu commented 4 months ago

As i already said , the timestamps are differing by an order of 10. the screenshot is a log of timestamps before and after the encoder stage. a timestamp like 38591 becomes 234666, i am running edge on linux ( ubuntu ). image

Yahweasel commented 4 months ago

You do understand that WebCodecs timestamps are always in microseconds? I can't really guess what scale your timestamps are meant to be, but if they were microseconds, and the encoder was trying to honor them, it would have to compress the audio in that output to less than a tenth of a millisecond long...

RavikumarTulugu commented 4 months ago

You are right in your observation, after i posted this , we have corrected a timestamp bug in our code, i need to re-test this and see whether the bug is still there and on which side ,whether our application or the libav.js . will soon update.

RavikumarTulugu commented 4 months ago

The issue is still seen with microsecond timestamps as well, what my suggestion would be to keep the output timestamps in the same range as the input time range but not wildly out. This issue still affects us as we have a livecast module as well and i could clearly see that if we donot modify the output timestamps as is , we are seeing an AV sync issue in the resulting video. image

Yahweasel commented 3 months ago

What is not possible: Honoring input timestamps continuously. Audio encoders care not for input timestamps. They simply output continuous audio. "Keep the output timestamps in the same range" is not an meaningful goal, since the output timestamps are necessarily continuous. They are restricted by the previous output timestamps.

What is possible: Starting with the timestamp of the first input chunk, then ignoring all other input timestamps. This is what Google Chrome does, and so is a reasonable approach in that sense. It is not mandated by the spec.

Is Google Chrome's approach what you want? I could come 'round to doing that, but honoring input timestamps more broadly is just not meaningful.

RavikumarTulugu commented 3 months ago

if we take above example, for the input timestamp '41546400' outputs '12074666' , if we were to use this timestamp in the further encoder pipe line or lets say mp4 or mpegts containerization , it clearly disturbs the AV sync, moreover in our application we set the timestamps at the start of the pipeline instead of browser, clearly our timestamps are out of order after they pass through the libavjs encoder.

Yahweasel commented 3 months ago

The timestamps in the output do not, in any continuous sense, have anything to do with the input timestamps, and nor can they. The output timestamp of packet n is just the output timestamp of packet n-1 plus the duration of packet n-1. That is the only way that output timestamps can work. Everything you're showing me about packets in the middle of a stream is not actionable. It is neither meaningful nor possible to do what you ask.

Google Chrome's WebCodecs implementation uses the first input timestamp and then discards all future input timestamps. If your input timestamps are contiguous but do not start at 0, then it's presumably doing what you want. It would be possible for the polyfill to do the same. What I'm trying to figure out is if that's what you want or are expecting, or if you're expecting something impossible. I'm not going to spend the time to make the output timestamps match Google Chrome's if what you want is for the output timestamps to continuously adjust to the input timestamps, which is not possible or meaningful.

RavikumarTulugu commented 3 months ago

Google chrome approach will work

Yahweasel commented 3 months ago

I believe that faf5f7a02ecb9ca70339a47f1ffff15118d73d53 should fix this. Does it?

RavikumarTulugu commented 3 months ago

I looked at the code and it seems to mitigate.