I had run the MiniGPT-4 on Nvidia T4 which has 16G memory. I could upload picture. But when I asked question about this picture it reported CUDA out of memory.
I want to use deep speed inference tech to run MiniGPT-4. Because deep speed inference can offload the parameters to CPU memory and swap to GPU memory when necissary. I had write a shell to use deep speed to start inference, a deep speed configuration file, a initialize method to invoke deep speed. But it failed to start correctly. Who can tell to how to do?
I had run the MiniGPT-4 on Nvidia T4 which has 16G memory. I could upload picture. But when I asked question about this picture it reported CUDA out of memory. I want to use deep speed inference tech to run MiniGPT-4. Because deep speed inference can offload the parameters to CPU memory and swap to GPU memory when necissary. I had write a shell to use deep speed to start inference, a deep speed configuration file, a initialize method to invoke deep speed. But it failed to start correctly. Who can tell to how to do?