Closed finnvoor closed 5 months ago
Amazing! Was just about to look at this, do you think there's any impact on audio quality a the breakpoints? I wouldn't expect much just curious
Amazing! Was just about to look at this, do you think there's any impact on audio quality a the breakpoints? I wouldn't expect much just curious
hmm, not really sure but I doubt it would be enough to notice. I didn't test much but I got the same transcript when a file was split into ~16 chunks.
Thanks for the contrib @finnvoor! We will run full evals for 1.0.0 on all this behavior and address regressions (if any). This looks to be low risk but we might need to couple this with VAD to be double sure.
FYI there appears to be an issue with this code that is placing audio in the wrong position in the outputBuffer. I am working on an approach that appends to the buffer every 10MB instead of writes directly to it.
closes #16
Resampling audio files in 10mb chunks reduces the peak memory usage and fixes some niche issues with transcribing very long or very high sample rate / channel count audio files.
10mb is a bit arbitrary, but I chose it to roughly match the peak memory usage of the rest of the pipeline.
I expect this will have a very minor negative impact on speed of resampling, but given this is a small fraction of the time compared to the rest of the pipeline + the memory savings, it seems like a reasonable tradeoff.