Closed pds2208 closed 3 months ago
Thanks! But now it's looking for mel_80_filters.npz. Where do I find this?
@pds2208
You can find the file in the whisper_en.zip
file, if you want to package your model, you can use whisper_en.zip
as example.
You can also download it directly from: https://resources.djl.ai/demo/pytorch/whisper/mel_80_filters.npz
Thanks guys. Working fine but very, very slow. It takes ~40s to translate a small audio file. The same file takes ~5s using the python implementation. Any way to speed this up?
Well this is interesting. Setting JniUtils.setGraphExecutorOptimize(false); as per this page:
https://djl.ai/docs/development/inference_performance_optimization.html
Reduced the time from ~40s to ~10s for the exact same file! What is going on?
jit::setGraphExecutorOptimize()
will profile your model in the first a couple runs and trying to reduce the latency for the rest of inference. Some model will see significant latency improve, some doesn't.
With setGraphExecutorOptimize
turned on, you usually see much longer latency for the 2nd (and 1st) inference per thread.
I created a PR in example to turn off setGraphExecutorOptimize: https://github.com/deepjavalibrary/djl/pull/2341
Thanks. I’m trying to figure out why it’s half the speed of the Python version, which forks off ffmpeg to run the audio conversion. It should be at least as fast…
On my mac, once the model is warmed up, it only takes 4s to run inference.
Oh wow. The example closes the model after each use. I tried keeping it open but received an error "Native resource has been release already". How do I keep the model open to reuse it?
Please get the latest code, I just fixed the issue in the previous PR
Thanks. Grabbed the latest code. Still 10s here.
@pds2208 I would suggest we isolate the audio processing part and just key inference. Could you try to save the audio into npz and use Java/Python both load that (DJL have NDList function to load an npz) and see how long inference takes. Ideally java and python should works similar
Description
Would it be possible to provide the code you have used to generate the TorchScript version of, in particular, the Whisper model? ATM, the demo uses the small.en model and it would be very useful to be able to use the multi language models as well as the different size models.
Will this change the current api? How?
No
Who will benefit from this enhancement?
Everyone