CouncilDataProject / cdp-backend

Data storage utilities and processing pipelines used by CDP instances.
https://councildataproject.org/cdp-backend
Mozilla Public License 2.0
22 stars 26 forks source link

MP4 Videos Unnecessarily Encoded During Trim #233

Closed whargrove closed 1 year ago

whargrove commented 1 year ago

Describe the Bug

In clip_and_reformat_video if the video is already mp4 ffmpeg will encode the output stream as mp4.

Expected Behavior

Use ffmpeg StreamCopy to avoid re-encoding an already mp4 encoded video.

Reproduction

Proof of concept with ffmpeg-python:

Existing implementation:

import ffmpeg
import os

(
    ffmpeg
    .input('big_buck_bunny_720p_surround.mp4', ss='01:00', to='02:00')
    .output('trimmed.mp4', format='mp4')
    .run()
)

os.remove('trimmed.mp4')

using time python main.py shows real ~3s. I didn't benchmark this so it's anecdotal. On my machine, the encode speed is ~20x (60 / 20 ~= 3).

Use StreamCopy:

import ffmpeg
import os

(
    ffmpeg
    .input('big_buck_bunny_720p_surround.mp4', ss='01:00', to='02:00')
    .output('trimmed.mp4', format='mp4', codec='copy')
    .run()
)

os.remove('trimmed.mp4')

(Note the only difference is codec='copy' arg in output.)

When using StreamCopy, the pipeline on my machine takes ~200ms.

Environment

evamaxfield commented 1 year ago

I don't know if there is a guarantee that the video is an mp4 so how about a fast check of the format and IF it is an mp4, add this codec="copy" otherwise, keep the re-encode to mp4 imo.

Feel free to open a PR!

whargrove commented 1 year ago

@evamaxfield I was going to naively use the file extension, but it's a fair point that the ext might be lies!

Using file ext + ffmpeg.probe should be sufficient here.

I'll submit a PR soon with the change.

evamaxfield commented 1 year ago

@whargrove once this build is done, wait ~5 minutes, then update your instance to use the new release: https://github.com/CouncilDataProject/cdp-backend/actions/runs/4916882908