Closed daggs1 closed 2 weeks ago
I understand now, your small example requires 5GB of vram, my gpu has only 4GB of vram, shame, is there any way to reduce memory consumption?
Nowadays we do have more techniques to reduce inference-time memory, including model offloading, autotune compile, quantization and maybe others. We do not have the code for these ready now but you are welcome to contribute. The first one does not incur any time overhead usually, while compiling will take some time before the start. Quantization would need some extra code and more tuning.
Greetings,
I'm trying to run gardio_app demo like stated in the readme and I'm getting this error:
any idea what is wrong? I've ran the setup like stated in the readme file