dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.38k stars 482 forks source link

Text generation web ui container hangs when trying to load model with Jetson Orin Nano developer kit #331

Open ddtch opened 1 year ago

ddtch commented 1 year ago

Hello folks, so, I faced a problem and can not understand: is it a limitation of Jetson Orin or, did I do something wrong?

So I cloned the repo, installed all requirements, and ran the container all is fine I can access the GUI. But it seems like it has no models. So I downloaded the model with the command below.

./run.sh --workdir=/opt/text-generation-webui $(./autotag text-generation-webui) /bin/bash -c \
  'python3 download-model.py --output=/data/models/text-generation-webui TheBloke/Llama-2-7b-Chat-GPTQ'

after what to load this model I ran the command

./run.sh $(./autotag text-generation-webui) /bin/bash -c \
  "cd /opt/text-generation-webui && python3 server.py \
    --model-dir=/data/models/text-generation-webui \
    --model=TheBloke_Llama-2-7b-Chat-GPTQ \
    --loader=llamacpp \
    --n-gpu-layers=128 \
    --listen --chat --verbose"

It works fine I do not see any errors but after some time it just hangs with the screen

IMG_0650

So my assumption is I am running out of RAM and it stops. or maybe I should change something in the config? In general, is this model suitable for Jetson orin nano developer kit or no?

dusty-nv commented 1 year ago

@ddtch yes, your memory and swap are full. Try mounting more swap like here:

https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#mounting-swap

And if that still doesn't work, you can try llama.cpp loader and TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf model instead.

johnnynunez commented 1 year ago

@ddtch yes, your memory and swap are full. Try mounting more swap like here:

https://github.com/dusty-nv/jetson-containers/blob/master/docs/setup.md#mounting-swap

And if that still doesn't work, you can try llama.cpp loader and TheBloke/Llama-2-7B-Chat-GGUF/llama-2-7b-chat.Q4_K_M.gguf model instead.

When jp6 will be available? End of this month?

dusty-nv commented 1 year ago

@johnnynunez your question is unrelated to this issue, but yes it is supposed to be released at the end of this month

ddtch commented 1 year ago

@dusty-nv is it a way to specify max_memory as config? I guess it also should work but just will take longer time

dusty-nv commented 1 year ago

@ddtch the command-line arguments for AutoGPTQ in text-generation-webui are found in the oobabooga github:

https://github.com/oobabooga/text-generation-webui#autogptq

It does not appear that there is a CLI option for max_memory (at least not listed). You could try loading the model through the web UI instead, and setting it there. Also you are specifying llama.cpp as loader, but an AutoGPTQ model. You could try a GGUF model with llama.cpp loader instead, and see if that helps your memory issues.

If you still only have 3GB of swap, I would disable ZRAM and mount larger swap like in the docs that I linked to.