dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
https://dusty-nv.github.io/NanoLLM/
MIT License
196 stars 31 forks source link

Stuck at Quantization? or just taking a long time to run? #4

Open P15V opened 6 months ago

P15V commented 6 months ago

Hello all,

I hope whoever reads this is doing well!! :)

So I'm trying to get this going on my Jetson Nano 8GB. I'm getting stuck (maybe?) at Quantization. I run this command, and I get the terminal output that it's quantizing the model and that this will take a while. And it seems to lock up/get stuck there? I've had it going for the past 1-1.5 hours with no further outputs or such, and the entire Jetson Nano is locked up. I can't interact with it, can't SSH into it.

Do you know if this is normal? or is something going wrong? Am I doing something wrong? I'm going to let it run for a few hours to see if it accomplishes anything.

Thanks for everyone's time!! :) . My run command inside the container : python3 -m nano_llm.chat --api=mlc \ --model Efficient-Large-Model/VILA-2.7b \ --max-context-len 128 \ --max-new-tokens 32

dusty-nv commented 6 months ago

@P15V an hour and a half is too long, it is probably froze up. Try rebooting it and then mounting more SWAP memory, disabling ZRAM, and if needed disable the desktop GUI like here:

https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#mounting-swap

Also, try testing --model princeton-nlp/Sheared-LLaMA-2.7B-ShareGPT first (this is the base model for VILA-2.7B) and see if you can get that going for text-only chat