FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Example commands for OPT-30B and OPT-66B on machines with 32GB of system RAM and 24 GB of VRAM. #48

Closed Meatfucker closed 1 year ago

Meatfucker commented 1 year ago

Very basic batch scripts for 30b and 66b on machines with 32gb of system ram and 24gb of vram.

In order to get 66b to run youll need to increase the size of the Windows page file considerably. I set it at 400gb but you could get away with a smaller increase. I didnt do the math on how much it would need to convert the 66b model.

Ying1123 commented 1 year ago

Thanks for contributing this. #69 should make the weight download more memory-efficient, so you do not need to increase the Windows page file anymore.