elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

Support stream input in Whisper serving and stream ffmpeg chunks #361

Closed jonatanklosko closed 3 months ago

jonatanklosko commented 3 months ago

Closes #261.

This allows the Whisper serving to accept a stream of consecutive chunks. Importantly, this improves the {:file, path}, such that we read the file in chunks using ffmpeg, rather than loading it into memory all at once.

This PR drops support for a list if inputs, such as Nx.Serving.batched_run(MyServing, [{:file, path1}, {:file2, path2}]). This serving works on a higher level than usually, because a single chunked input is already multiple inputs to the model, so I think this is sane. Multiple inputs can be processed concurrently by calling batched_run from multiple processes.

I noticed a bug, specifically when streaming with timestamps and using a batch_size, we would ignore small segments after every batch.


@josevalim I went with the stream of consecutive chunks and handle accumulation + overlapping internally. I think it is preferable to keep the chunking details internal and I don't see a benefit of exposing it. If we shift accumulation to the user, they would basically need to do exactly that. For ffmpeg we would do that in bumblebee anyway, because accumulation is better than duplicating ffmpeg decoding work.