Stuck at Quantization? or just taking a long time to run?

dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.

MIT License

196 stars 31 forks source link

Hello all,

I hope whoever reads this is doing well!! :)

So I'm trying to get this going on my Jetson Nano 8GB. I'm getting stuck (maybe?) at Quantization. I run this command, and I get the terminal output that it's quantizing the model and that this will take a while. And it seems to lock up/get stuck there? I've had it going for the past 1-1.5 hours with no further outputs or such, and the entire Jetson Nano is locked up. I can't interact with it, can't SSH into it.

Do you know if this is normal? or is something going wrong? Am I doing something wrong? I'm going to let it run for a few hours to see if it accomplishes anything.

Thanks for everyone's time!! :) . My run command inside the container : python3 -m nano_llm.chat --api=mlc \ --model Efficient-Large-Model/VILA-2.7b \ --max-context-len 128 \ --max-new-tokens 32

dusty-nv / NanoLLM

Stuck at Quantization? or just taking a long time to run? #4