Open hobodrifterdavid opened 3 months ago
@hobodrifterdavid Thanks for bringing up the issue. The documentation is a bit outdated. Can you please try the latest main branch and this Docker command instead:
docker run --name wordcab-transcribe --gpus all --shm-size 1g --restart unless-stopped -p 5001:5001 -e WORDCAB_TRANSCRIBE_API_KEY="x" -e WHISPER_MODEL="medium" -e WHISPER_ENGINE="faster-whisper-batch" -e ALIGN_MODEL="tiny" -e DIARIZATION_BACKED="longform-diarizer" -e COMPUTE_TYPE="float16" -e DEBUG="True" -e USERNAME="admin" -e PASSWORD="password" -e OPENSSL_KEY="0123456789abcdefghijklmnopqrstuvwyz" -e WINDOW_LENGTHS="2.0,1.5,1.0,0.75,0.5" -e SHIFT_LENGTHS="1.0,0.75,0.625,0.5,0.25" -e TENSORRT_LLM_VERSION="0.9.0.dev2024032600" wordcab-transcribe
The environment variables are from the .env file, feel free to customize.
On the second machine, I'm able to build if I add ipython to requirements.txt. The 'docker run' command in the readme does start the container sucessfully, and I'm able to process a request, but it errors out if I try to use the VAD. It seems okay with the updated command you sent. On the first machine, still illegal memory access, but I will wipe the machine and try again.
I got a few questions. :)
Is there a preferred backend for processing a long file over multiple GPUs?
In your docs, TensorRT-LLM doesn't allow passing a prompt. The prompt is useful for nudging the model towards outputing zh-CN or zh-TW, as there is only a single supported Chinese language code for whisper. Although, I guess machine translation as a post-processing step might be reasonable way to handle this.
Faster-Whisper has a length_penalty parameter that I understand increases the probabilty of the 'end of segment' token, the longer the segment gets. I think it's useful for pushing the output towards making shorter segments/subs. Could it be exposed in the API? The current output often gives segments that are too long to show as subtitles. btw, I noticed today that stable-ts has a set of functions for splitting and merging subs, although a proper sentence segmenter would additionally be helpful.
@hobodrifterdavid I noticed the missing IPython as well, check out the latest main branch that I just pushed, should resolve a few issues.
I kind of now prefer the Whisper engine I just added, faster-whisper-batched
, which adds a bunch of unmerged PRs from the faster-whisper library that make things go fast.
Use the edited docker run
command above and head to the FastAPI docs, where the first audio file endpoint should have the length_penalty
parameter. I recommend setting batch_size
to 4 or 8 at least, and num_beams
to 5. given your GPU.
FastAPI docs are a bit weird for list input, so if you want to add vocab you'll need to use curl
or requests
with the audio file endpoint. You can use the audio-url endpoint and add vocab and the other parameters in the JSON, but you'll need a presigned URL to test that.
I wiped the first machine, it runs fine now. I didn't see the length_penalty param in the docs yet.
The Silero VAD is used? Do you know how it compares to other VADs (nemo etc.), in different languages?
I think you might have not pushed the length_penalty. 👀🙂
Hello. This project looks very interesting. I hit some issues building the Dockerfile as described in the readme:
On the first machine (Ubuntu Server 22 LTS, 4x 3090), the build process completed, but I got an 'illegal memory access' error, I think from a CUDA library, when starting up. This machine previously had a modified nvidia driver for P2P access, so it's possible it's not your issue. (https://github.com/tinygrad/open-gpu-kernel-modules/issues/4)
On the second machine (Ubuntu Server 22 LTS, 1x 3090), initially I had an error about the specific version of openssl not being available or compatible, I removed the version number specified in the Dockerfile, and the build continued. But the latest error is "ModuleNotFoundError: No module named 'IPython'"
Just a heads up, ideally I'd be able to help you debug.