Hi Great project. I'm trying to run openai_api_demo.py on 2 nvidia 3090 graphics cards. Unfortunately, the current script works great but on 1 card with --quant 4
I would like to be able to use the 2 GPU model with full precision.
Thank you in advance for your help
Hi Great project. I'm trying to run openai_api_demo.py on 2 nvidia 3090 graphics cards. Unfortunately, the current script works great but on 1 card with --quant 4 I would like to be able to use the 2 GPU model with full precision. Thank you in advance for your help