Closed anotherjesse closed 1 year ago
running ffprobe from within the cjwbw/whisper container shows:
ffprobe version 4.2.7-0ubuntu0.1 Copyright (c) 2007-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
(snip)
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'whisper.mp4':
Metadata:
major_brand : iso5
minor_version : 1
compatible_brands: isomiso5hlsf
creation_time : 2022-09-25T17:48:04.000000Z
Duration: 00:00:00.98, start: 0.000000, bitrate: 1096 kb/s
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 1081 kb/s (default)
Metadata:
creation_time : 2022-09-25T17:48:04.000000Z
handler_name : Core Media Audio
whereas on a more recent release via homebrew:
$ ffprobe whisper.mp4
ffprobe version 5.1.1 Copyright (c) 2007-2022 the FFmpeg developers
built with Apple clang version 13.1.6 (clang-1316.0.21.2.5)
(snip)
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'whisper.mp4':
Metadata:
major_brand : iso5
minor_version : 1
compatible_brands: isomiso5hlsf
creation_time : 2022-09-25T17:48:04.000000Z
Duration: 00:00:05.76, start: 0.000000, bitrate: 186 kb/s
Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 184 kb/s (default)
Metadata:
creation_time : 2022-09-25T17:48:04.000000Z
handler_name : Core Media Audio
vendor_id : [0][0][0][0]
Unfortunately for the ubuntu version cog containers use (focal), the version installed is the most recent packaged version: https://launchpad.net/ubuntu/+source/ffmpeg
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
It looks like nvidia has docker containers for ubuntu22.04
and https://github.com/replicate/cog/blob/main/docs/yaml.md references "the nvidia-docker base image"
it might be more valuable to test allowing cog to build with a more recent ubuntu vs exploring backport of ffmpeg
https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=ubuntu22.04 seems to have support for a newer version of cudnn (11.7 series vs 11.3 series)
As I local test I added to cuda_base_image_tags.json:
+ "11.7.1-cudnn8-devel-ubuntu22.04",
Which means you have to update the dockerfile generator:
- python-openssl \
+ python3-openssl \
and then tell the whisper cog.yaml to use cuda: 11.7.1
Started a cog build
to try out later
$ cog predict cog-cog-whisper -i audio=@whisper.mp4
Starting Docker image cog-cog-whisper and running setup()...
Running prediction...
Transcribe with base model
{
"detected_language": "english",
"transcription": " I am just trying the recorder out with my mobile phone because why not?"
}
Stopping container...
yay - at least for a small 5 second sample, it worked with the larger model
I'm pushing a build to https://replicate.com/anotherjesse/whisper-updated to allow others to test if they wish.
This seems to be fixed with the latest fixes by @chenxwh
transcribing iPhone's voice memos directly from their native m4a format didn't work
It transcribed about half of my 25 minute memo. (if you have it output the timestamps you can see it tries to read later audio but only transcribes
...
If I convert it to an mp3 before sending it to cog-whisper (or the timestamp version), it succeeds.
Similarly someone showed up in discord with an issue with mp4 files being truncated