Open nonopolarity opened 1 year ago
Youtube separates the audio and video streams for higher resolution videos.
You will have to use ffmpeg to combine these streams, thankfully this repo has an example https://github.com/fent/node-ytdl-core/blob/master/example/ffmpeg.js
does ffmpeg combine the video and audio like in a few seconds? I could also use Final Cut to combine them as it basically is a reencode and it takes a long time. VLC Player can also combine the video and audio and it takes only 1 or 2 seconds or just a few seconds even if the video length is an hour
I think that will depend on your hardware and usecase.
I have found that the performance of ffmpeg is quite impressive when it comes to mixing just the 2 streams. Also it wouldnt be a "re encoding" technically, the way the documentation describes it. because you are just copying the stream from the video "-c:v copy" flag
I am more concerned about, doing it this way using ffmpeg,
Which one is it?
You only have to combine the video and audio files if you download video-only and audio-only streams.
If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this:
const info = await ytdl.getInfo(url, {});
const format = ytdl.chooseFormat(formats, {
filter: "audioandvideo",
quality: "highest",
});
ytdl.downloadFromInfo(info, {
quality: format.itag
})
You only have to combine the video and audio files if you download video-only and audio-only streams.
If you don't care about downloading the absolute highest quality you can just download the highest quality stream that already contains audio and video with something like this
right. in the past it often means 360p, which is vastly different from 720 or 1080p
I am more concerned about, doing it this way using ffmpeg,
- does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
- does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).
Which one is it?
That completely depends on your use case.
In my use case I just use the second option (copy) i dont reencode.
I am more concerned about, doing it this way using ffmpeg,
- does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
- does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).
Which one is it?
the question is not about which one is it. The question is about how does ffmpeg do it and naturally, if a job can be done in 2 seconds, I don't want to spend 2 to 5 minutes to do it.
pass the -c copy flag to the ffmpeg command and it wont reencode
-c:v copy
and -c:a copy
will only work if you're merging two compatible streams (or if you're merging them into an mkv
wrapper that basically supports streams of any type).
If your video is encoded with h264
(.mp4
) your audio needs to be encoded with aac
to copy both streams into a new .mp4
without re-encoding.
If your video is encoded with vp8
or vp9
(.webm
) your audio needs to be encoded with either opus
or vorbis
to copy both streams into a new .webm
without re-encoding.
The technique the example ffmpeg.js script uses to merge audio and video is to always copy the audio codec and always re-encode the audio (it includes -c:v copy
but doesn't specify the audio encoding which means ffmpeg will always re-encode the audio to a compatible format).
This isn't a terrible strategy because:
ffprobe
to check that the streams are compatible.You could make sure you never re-encode by selecting compatible video and audio streams at download time.
To add more to @christiangenco 's answer in my experience or at least the way I understand it is, that youtube will take your input video (the video file you upload) and re-encode it in those exact formats (h264/h265) for videos and then aac for audio, therefore when using the ffmpeg method, you are able to just use copy encoding all the time (atleast in my experience)
Yup 👆
The trouble is that YouTube also re-encodes your video into webm
and opus
so often when I ask node-ytdl-core
for bestaudio
and bestvideo
it gives me two incompatible formats.
I recommend avoid using opus
for audio and use the mp4a.40.2
if you are planning to use the .mp4
format
mp4 players usually does not support 48khz which opus
uses.
How can I merge video and audio to output an mp4?
My code:
const audioStream = ytdl(URL as string, {
filter: 'audioonly',
quality: 'highestaudio',
});
const videoStream = ytdl(URL as string, {
filter: (format) => format.hasVideo && (format.container === 'mp4' || format.container === 'webm'),
quality: qualityOption,
});
I am more concerned about, doing it this way using ffmpeg,
- does it involve reencoding (usually takes quite long. For a 10 minute video, it will take 2 to 5 minutes), or
- does it only involve putting the two files into one file (usually just copy two data chucks into one file and is super fast. For a 10 minute video, it will take 2 seconds).
Which one is it?
the question is not about which one is it. The question is about how does ffmpeg do it and naturally, if a job can be done in 2 seconds, I don't want to spend 2 to 5 minutes to do it.
@nonopolarity Did you manage to resolve this issue? I have the same problem I need a quick response to merging the files (in less than 5 seconds)
I tried to use the files separately with javascript in the browser, but Safari limits the amount of media that is running and ended up breaking my application
Often we have to download the 1080 video and audio file separately as 2 files. Is it true we just have to use ffmpeg or the Python or JS port of ffmpeg to combine the 2 files into one .mp4 file? ytdl-core probably doesn't have this feature?
(example: https://zulko.github.io/moviepy/ https://github.com/ffmpegwasm/ffmpeg.wasm )