Closed DavidGOrtega closed 11 months ago
I tried another file and depth is always 0
Loading waveform...
Loading model...
Depth: 0
Chunk 0:
Depth: 0
Chunk 1:
Depth: 0
Chunk 2:
Depth: 0
Chunk 3:
Depth: 0
Chunk 4:
Depth: 0
Depth: 1
Chunk 5:
Depth: 0
Chunk 6:
Depth: 0
Chunk 7:
Depth: 0
Chunk 8:
Depth: 0
Chunk 9:
Depth: 0
Chunk 10:
Depth: 0
Chunk 11:
Depth: 0
Chunk 12:
Depth: 0
Chunk 13:
Depth: 0
Chunk 14:
Depth: 0
Chunk 15:
Transcription finished.
Disclaimer: I changed the code to use Cpu since I don not have CUDA
} else if #[cfg(feature = "torch-backend")] {
type Backend = TchBackend<f32>;
let device = TchDevice::Cuda(0);
}
to
} else if #[cfg(feature = "torch-backend")] {
type Backend = TchBackend<f32>;
let device = TchDevice::Cpu;
}
I also triend de wgpu backend with same result
cargo run --release --features wgpu-backend --bin transcribe tiny_en audio16k.wav en transcription.txt
I have downloaded base_en, small_en and medium_en and they works, the issue is only happening with tiny
I'll check it out. It might be that the tokenizer I uploaded to my hugging face for tiny is incorrect.
I tested it and the issue is that the tiny models hallucinate very badly while the larger models are good enough to work correctly without as much hand-holding. I recommend using at least the small models until I can reduce the hallucinations of the tiny models.
I tested it and the issue is that the tiny models hallucinate very badly while the larger models are good enough to work correctly
You mean your model conversion right? HF transformers model work fine with your audio. I have tested it
Interestingly enough, tiny works perfectly opposed to tiny_en
Running `target/release/transcribe tiny audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
Depth: 1
Depth: 2
Depth: 3
Depth: 4
Depth: 5
Depth: 6
Depth: 7
Depth: 8
Depth: 9
Depth: 10
Depth: 11
Depth: 12
Depth: 13
Depth: 14
Depth: 15
Depth: 16
Depth: 17
Depth: 18
Depth: 19
Depth: 20
Depth: 21
Depth: 22
Depth: 23
Depth: 24
Depth: 25
Depth: 26
Depth: 27
Depth: 28
Depth: 29
Depth: 30
Depth: 31
Depth: 32
Chunk 0: Hello, I am the Whisper Machine Learning model. If you see this as text then I am working properly.
Transcription finished.
You mean your model conversion right? HF transformers model work fine with your audio. I have tested it
The models should be equivalent. I think HF transformers uses a lot of heuristics to get the whisper models working consistently. Tiny en in particular seems to have a difficult time. In this case it is prematurely outputting an EOT (end of text) token before it gets to the words. The multilingual tiny seems to be a bit more robust.
I just updated the project with yet another heuristic/hack: masking some special tokens. Tiny_en now works for me. Let me know if you encounter any issues.
It works 🥳 !! however...
it does not work with the wgpu backend
cargo run --release --features wgpu-backend --bin transcribe tiny_en audio16k.wav en transcription.txt
Running `target/release/transcribe tiny_en audio16k.wav en transcription.txt`
Loading waveform...
Loading model...
Depth: 0
Chunk 0:
Transcription finished.
why 🤔 ?
It works on WGPU for me. Are you sure that you updated to the latest version of this project? It should be impossible for the latest version to stop at a depth of 0. Also make sure that your tokenizer is the correct one for tiny_en. wget
actually changes the names of downloaded tokenizers if there is a naming conflict rather than replacing.
@Gadersd I updated the repo however for some reason the build was still the same, I cleaned up the folder and regenerated again and both versions worked!
The file transcription is empty.
If I debug it
text
andtokens
are emptythis is my file generated by sox
audio16k.wav.zip