Closed matchingCube closed 7 months ago
Could very well be a bug, although I thought I had already fixed this bug in the past. Could I have a bit more context, maybe the code you use to add chunks to the muxer, and/or a demo output file with the desync present?
Thanks for your reply. Can you share your email or whatsapp or telegram info so that I can share my project and the issue via the video made using my app?
Excuse me, can we continue discussing?
Oh I'm sorry, I forgot to reply.
I'd be willing to help you privately only for a charge, as I'm also busy with other things. I think the much better solution would be for you to post the relevant code (with any sensitive parts removed) and a demo file right here on GitHub, so it's publicly available and is there to help people in the future.
Unfortunately, I can't share my project. If you can fix it, I will pay for your work.
Is it not possible to create a demo video that reveals no big secrets about your project? Or a code snipped with sensitive parts stripped out?
https://drive.google.com/file/d/18VKursQo1f3Mo70NbPP6_khivLA4al3D/view?usp=drive_link Please request access to this file so that you can check the test video with v-a sync issue.
And I invited you to my git repo of the project so that you can run it yourself and check the code.
I'll happily inspect your issue and help you out in private - I charge 48 USD an hour. You can send me this money over my Ko-fi which you find here: https://ko-fi.com/vanilagy
After you do, I'll look into the problem as soon as possible, so likely during the evenings (CEST).
Hi @Vanilagy , same issue here. Can I use ko-fi?
@UtkuBulkan You can, but I'd urge you to try and send a demo file here that shows this desync - or show me the code that produced the file, with sensitive data stripped out. This way, we can all benefit from the solution :)
I've had this issue (the audio would gradually get out-of-sync because it played slower) and managed to resolve it. For me it came down to making sure that the sampleRate of the audioEncoder is the same as the sampleRate of the file you decode.
let audioContext = new AudioContext();
let audioBuffer = await audioContext.decodeAudioData(await (await fetch('./assets/BigBuckBunny.mp4')).arrayBuffer());
audioEncoder.configure({
codec: 'mp4a.40.2',
sampleRate: audioBuffer.sampleRate, // the important bit which was a static value for me before
numberOfChannels,
bitrate: 128000,
});
let dataLength = (duration * audioBuffer.sampleRate);
let data = new Float32Array(dataLength);
const channel0 = audioBuffer.getChannelData(0).subarray(0, dataLength);
data.set(channel0);
let audioData = new AudioData({
format: 'f32-planar',
sampleRate: audioBuffer.sampleRate,
numberOfFrames: dataLength,
numberOfChannels: 1,
timestamp: 0,
data
});
audioEncoder.encode(audioData);
audioData.close();
Code above may not work entirely, just showing the gist of it, had to cut out a bunch of stuff like trying to combine channels and adding silence frames
Your problem might be slightly different, it's hard to say, I hope this helps.
@Mat-thieu That's interesting and somewhat unexpected. I would expect that when the encoder sample rate doesn't match the input buffer's sample rate, it automatically resamples the input before encoding.
Yeah I expected (and hoped 😄) that as well. But unfortunately resampling has to be a different, additional step, from some shallow researching I couldn't find any evidence for AudioEncoder doing any resampling, but maybe there's a way?
Also, I haven't dived into this part of the code, but from what I can see the Muxer options "audio.sampleRate" and "audio.numberOfChannels" don't seem to matter for the output, instead they will be the same as the AudioEncoder
Keep me updated if you find anything else that looks like a desync. Sadly, the other people in this thread have stopped responding :(
I'll close this issue for now as it's stale. If y'all still need help with something, feel free to reopen it :)
Hi @Vanilagy. I have the same issue when there is high CPU usage. If there is high CPU usage, then setInterval's callback is fired less often (not exactly 30 times per second). I prepared the next demo that will help you to reproduce this issue. Recording will work fine first 10 seconds. Then CPU will be slowed down. You can run copy+paste this code to the console of mp4-muxer demo. Also, here is the full video of the issue; and here is the result video where desync starts after 10th second.
function drawStopwatch(seconds) {
const canvas = document.querySelector('canvas');
const ctx = canvas.getContext('2d');
ctx.clearRect(0, 0, canvas.width, canvas.height);
ctx.beginPath();
ctx.arc(canvas.width / 2, canvas.height / 2, 80, 0, 2 * Math.PI);
ctx.stroke();
ctx.font = '20px Arial';
ctx.textAlign = 'center';
ctx.fillText('Stopwatch', canvas.width / 2, 30);
ctx.font = '30px Arial';
ctx.fillText(formatTime(seconds), canvas.width / 2, canvas.height / 2 + 10);
}
function formatTime(seconds) {
const hours = Math.floor(seconds / 3600);
const minutes = Math.floor((seconds % 3600) / 60);
const remainingSeconds = seconds % 60;
return pad(hours) + ':' + pad(minutes) + ':' + pad(remainingSeconds);
}
function pad(value) {
return value < 10 ? '0' + value : value;
}
function cpuIntensiveTask() {
let sum = 0;
for (let i = 0; i < 150000000; i++) { // Adjust the loop count
sum += i;
}
}
let seconds = -1;
const interv = setInterval(() => {
drawStopwatch(++seconds);
}, 1000);
setInterval(function() {
let duration = (document.timeline.currentTime - startTime)
console.log('fps', framesGenerated / (duration / 1000))
}, 1000)
// Call the CPU-intensive task repeatedly after 10 seconds
setTimeout(function() {
window.throttleInterval = setInterval(cpuIntensiveTask, 100);
setTimeout(function() {
clearInterval(window.throttleInterval);
}, 10000) //turn throttling off
}, 10000)
Does anyone have any ideas on how to fix this?
@LiubomyrB Sorry for the late response!
Yes so, the problem here stems from the fact that a new video frame is generated every time the setInterval kicks (which you correct asserted slows down when the CPU is blocked), but the timestamp of that frame is based on a constant increment/formula instead. This explains the desync that occurs.
const encodeVideoFrame = () => {
let elapsedTime = document.timeline.currentTime - startTime;
let frame = new VideoFrame(canvas, {
timestamp: framesGenerated * 1e6 / 30,
duration: 1e6 / 30
});
framesGenerated++; // <-- this thing here
...
Using a fixed increment to determine the timestamp, but using a setInterval
to call this function, is actually not the right thing to do. I've done a lot of game dev which also has a lot of loops and fixed timestep stuff, and in a way, this is a "rookie" mistake. I just kept the demo like this for simplicity purposes. The fix depends on what your requirements are and what you use this library for.
const updateRate = 30;
const updatePeriod = 1000 / updateRate;
let lastTickTime: number |Â null = null;
function tick() {
const now = document.timeline.currentTime;
if (lastTickTime === null) {
encodeVideoFrame();
lastTickTime = now;
return;
}
// This loop now encodes as many frames as necessary to "catch up" with now again
while (now - lastTickTime >= updatePeriod) {
encodeVideoFrame();
lastTickTime += updatePeriod;
}
}
Now, you call tick
as often as you want. You should call it at least as often as updateRate
times per second, but calling it more often does not hurt. You can call it as setInterval(tick, 0)
which will call it at about 250 Hz.
For this to make sense, encodeVideoFrame
needs to mathematically determine its timestamp. So basically what it's doing in the demo with framesGenerated++
.
@Vanilagy Thank you, it works!
Awesome!
It's better, but I still have some issues. I also tried another way to record video+audio. I have a canvas where I draw frames from different sources (camera, video file) like in OBS. Then I do next:
Expected result: I thought it would work the same way when I record video from Canvas in webm format using new MediaRecorder(stream) - audio and video are synced with each other even when there is a high CPU load (if there is a high CPU load, then both audio and video start freezing synchronously). I thought that frames and audio data would have more less the same timestamps, so it would not have sync issues.
Actual behavior: video is slower than audio when there is high CPU usage - when a user starts browsing files on his PC etc (tested on i5-7200U, 2 cores, 4 threads, 8 GB RAM).
The question is: how video and audio are mixed while recording with mp4-muxer. Do they wait for each other, for example, when audio is normal but video frames are created slower? Does audioData wait for VideoFrame with same/similar timestamp? Or are they mixed on the fly - immediately when data is passed to the muxer? On "read me" page you pointed out that the muxer needs to wait for the chunks from both media to finalize any given fragment (while recording fragmented MP4). Does it work the same way while recording regular mp4?
@LiubomyrB That's an interesting question! Also, excuse the delayed response.
For regular, non-fragmented MP4 files, there is no need to "wait" for the other track's chunks to write into the file. That's because the timestamps will be put into the header of the file later on anyway, and will then play back correctly, assuming video and audio both have the same duration. For fragmented MP4, I actually need to interleave the chunks (the waiting you referred to), because once a segment is finalized, I can't change it anymore, so I need the chunks from both tracks.
I'm not exactly sure what you're building, but I can tell you that you do NOT need to use captureStream
for canvas. The captureStream feels more hacky and works when you need to pull things out of some live media, like a microphone, but for a canvas, you can simply use the VideoFrame constructor, like I do in my demo.
You should log out the timestamps of your encoded chunks of both tracks in the case of high CPU load. It could be that one of the tracks is counting time differently than the other, even though this is strange if both use captureStream
, since you'd expect the timestamps to stay roughly in-sync.
I am trying to make mp4-muxer work on slower computers without the bug with audio/video synchronization. Currently, I record in fragmented mp4 as it allows saving each chunk separately to OPFS so we can recover it if the recording is stopped unexpectedly. The problem is that the first 20 seconds of the recording are fine. Next, it begins to desynchronize gradually - audio is faster than video, but it's the video that has the correct length. I built mp4 muxer with some additional logs.
As you can see, the difference between audio and video timestamp is 1.43s at the end. Turned out that AudioEncoder encodes audio slower than VideoEncoder encodes video as there is no big difference between audio and video timestamp before audioEncoder.encode(audioData).
The question is: do we need to process audio via AudioEncoder? Is it possible to pass AudioData from MediaStreamTrackProcessor to mp4-muxer directly without AudioEncoder?
I have multiple thoughts on this:
If one encoder is slower than the other, this still shouldn't lead to an incorrect/desynced file. It simply means the file takes as long to create as the slowest encoder takes to finish. As long as your finalization steps looks like this:
await Promise.all([audioEncoder.flush(), videoEncoder.flush()]); // Awaiting serially might also be fine
muxer.finalize();
You should be good.
The more important thing would be that the timestamps should eventually line up. That is, when both encoders are finished, look at their last chunks. Ideally, their timestamp + duration
values should be very close. If the last timestamps you get look like 60.71466599999985 62.14641400000001
, and there are no more chunks coming after that, then something is off with the way you encode media, where one medium is somehow longer than the other. That's why there are 25 video chunks still in the queue, because the video is already at 62 seconds, but your audio is still at 60 seconds. Can you look into what's going on there? Can you share how you determine the timestamps for both video and audio? (I assume for audio, it's coming straight out of a MediaStreamTrackProcessor
.)
The question is: do we need to process audio via AudioEncoder? Is it possible to pass AudioData from MediaStreamTrackProcessor to mp4-muxer directly without AudioEncoder?
Again, I'm not sure the AudioEncoder is actually the bottleneck here, especially because encoding audio is also way faster than encoding video. But in general, I guess you can go from AudioData
directly to an EncodedAudioChunk
, but then you must use some raw, uncompressed format, since AudioData
is uncompressed. Probably not what you want!
I fixed the bug with synchronization (in this case, on Mac mini 2014, at least) by simply specifying the "latencyHint": 0.30 (or "playback" at least) option when creating AudioContext. If to leave this option unspecified (default - "interactive"), then clicking sounds will appear after the first 20s of recording - this, in turn, causes the bug with synchronization. Probably, underpowered computers just are not able to process audio fast in "interactive" mode. So, sorry for bothering you.
Thanks for your great module. Until 480p video, it is good. But when I record 720p video, then audio & video sync problem occurs. Audio length is longer than video. Can you guide me how to resolve this issue? Or is it this module's bug?