Open saiful9379 opened 3 months ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
For the example when model loading the RAM required close to 5 GB and VRAM use 2.1 GB. How can i reduce RAM uses for loading the model infernce fime. basically i try to figure out which is the issue for taking more RAM. Here i found when i initialize the GPT block then this model used closed to 5 GB RAM. this RAM is not GPU memory.
To Reproduce
Inference used RAM : 4634.7890625
Expected behavior
Expected low RAM use when inference
Logs
No response
Environment
Additional context
No response