For the Whisper one, I switched to abidlabs/whisper, as the other one is actually stateful and concatenates responses together, which we don't need here
For the Image captioning one, I switched to taesiri/BLIP-2 as this one is faster as it only uses BLIP rather than running all of the models
Made 2 small updates to the tools
abidlabs/whisper
, as the other one is actually stateful and concatenates responses together, which we don't need heretaesiri/BLIP-2
as this one is faster as it only uses BLIP rather than running all of the models