Update tensorrt llm to v0.9.0

collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.

MIT License

2.1k stars 286 forks source link

Update tensorrt llm to v0.9.0 #227

Closed makaveli10 closed 5 months ago

makaveli10 commented 5 months ago

Use pip install tensorrt_llm instead of building tensorrt_llm from scratch in Dockerfile

peldszus commented 5 months ago

I tested the docker file and ran a freshly compiled large-v3 engine. This works nicely! :+1:

Thanks for the fix wrt to TensorRT 0.9.0, I actually needed this. It fixes the issue that TensorRT 0.7.1 excessively allocated VRAM (~3x the model). This is not the case anymore with 0.9.0.

One thing that I saw, but this is probably worth a separate PR: The VRAM usage is increasing by ~350mb for every new client connection. I hope this will not be the case if I use the single-model option (#223). I can try that tomorrow.

peldszus commented 5 months ago

One thing that I saw, but this is probably worth a separate PR: The VRAM usage is increasing by ~350mb for every new client connection. I hope this will not be the case if I use the single-model option (#223). I can try that tomorrow.

I could verify that with the single model option, there is no memory increase. So this seems to be an issue with model cleanup and re-instantiation only.

makaveli10 commented 5 months ago

@peldszus thanks for confirming here, will look into that when we instantiate new model for every new user.