Gadersd / whisper-burn

A Rust implementation of OpenAI's Whisper model using the burn framework
MIT License
268 stars 33 forks source link

bug: transcribing with medium model #2

Closed b0xtch closed 1 year ago

b0xtch commented 1 year ago

OS: Mac Ventura

Seems like with the tiny model, transcription works, but when using the medium you get a buffer size error. Perhaps we could do chunking

     Running `target/release/whisper audio.wav medium`
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_bind_group
    Buffer binding 0 range 212439040 exceeds `max_*_buffer_binding_size` limit 134217728

', /Users/botch/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.17.0/src/backend/direct.rs:3056:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::ops::function::Fn::call
   3: <wgpu::backend::direct::Context as wgpu::context::Context>::device_create_bind_group
   4: <T as wgpu::context::DynContext>::device_create_bind_group
   5: wgpu::Device::create_bind_group
   6: burn_wgpu::context::base::Context::execute
   7: burn_wgpu::kernel::index::select::select
   8: burn_tensor::tensor::ops::modules::base::ModuleOps::embedding
   9: whisper::model::Whisper<B>::forward_decoder
  10: whisper::main

Update

Using a six-minute audio file with the tiny model produces the same issue.

Gadersd commented 1 year ago

Chunking is the next planned feature. Right now it clips audio to around the first 30 seconds for the encoder, but the decoder sequence length isn't limited so it will overflow if it doesn't detect the end by the 30 second mark.

Gadersd commented 1 year ago

Rudimentary chunking is now implemented. Your long audio files should now work, although there is some minor transcription inaccuracy around the chunk edges. I tried incorporating the last few tokens from the previous chunk into whisper to remedy the chunk edge issues but then whisper severely repeats itself and stops predicting the end of chunks so I had to revoke that change. Any ideas why whisper is so finicky when exposed to tokens from the previous chunk?

b0xtch commented 1 year ago

Mint

b0xtch commented 1 year ago

whisper is so finicky when exposed to tokens from the previous chunk

sounds like Whisper hallucination it happens in other implementations as well. I would have to dig into this one...