Closed libai-lab closed 2 weeks ago
Yes, since our process uses multiple models to complete different predictions, the total model size will be larg. About 18GB of GPU memory will run, and if you don't care about memory leaks, using pipeline.enable_model_cpu_offload()
can reduce it to about 10GB with essentially no degradation of inference. We haven't figured out the root cause of this memory leak for a while now.
I used the huggingface demo and found that the entire file takes up about 70g. Is there any way to reduce the file size? How much GPU memory is needed to run it? Is there a way to reduce the graphics memory? My 16GB graphics memory reported an error cuda out of memory