dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.09k stars 435 forks source link

Swap memory not working #372

Closed raj-khare closed 7 months ago

raj-khare commented 7 months ago

I'm trying to load a 70B 4-bit quantized model on a 32 GB jetson kit. However my process gets killed even when i have a lot of swap. Does MLC support swap out of the box? Any advice is highly appreciated

image

root@tegra-ubuntu:/# python3 /opt/mlc-llm/benchmark.py --model /data/models/mlc/dist/models/q4f16_ft/params/ --prompt /data/prompts/completion_16.json --max-new-tokens 128
Namespace(model='/data/models/mlc/dist/models/q4f16_ft/params/', prompt=['/data/prompts/completion_16.json'], chat=False, streaming=False, max_new_tokens=128, max_num_prompts=None, save='')
-- loading /data/models/mlc/dist/models/q4f16_ft/params/
Killed
root@tegra-ubuntu:/# 
dusty-nv commented 7 months ago

@raj-khare yes it does, but GPU/CUDA memory can't be swapped, and a 4-bit 70B model takes up ~35GB GPU memory just for the weights. It might work in 32GB memory if you use llama.cpp and don't offload all the layers to the GPU, but for 70B models I would really recommend using Jetson AGX Orin 64GB.

raj-khare commented 7 months ago

got it