dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
2.17k stars 446 forks source link

LlamaSpeak cannot run with Llama-3.1-70B-Instruct #646

Open SuWeipeng opened 2 days ago

SuWeipeng commented 2 days ago

I'm trying to run a 70B model on my Jetson AGX Orin(64x64GB), but it automatically interrupts when I simply replace the 8B model. How can I get the 70B model to run?

When I run the command below, something interrupt the process automatically.

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxx  \
  dustynv/nano_llm:r36.3.0   \
  python3 -m nano_llm.agents.web_chat --api=mlc  --debug   \
    --model meta-llama/Meta-Llama-3.1-70B-Instruct     \
    --asr=whisper --tts=piper

2024-09-26 163215

If I run with 8B model, it works very well, for example:

jetson-containers run --env HUGGINGFACE_TOKEN=hf_xxxxx  \
  dustynv/nano_llm:r36.3.0   \
  python3 -m nano_llm.agents.web_chat --api=mlc  --debug   \
    --model meta-llama/Meta-Llama-3.1-8B-Instruct     \
    --asr=whisper --tts=piper
dusty-nv commented 2 days ago

@SuWeipeng can you test Llama-3.1-70B with the baseline nano_llm.chat first? How much memory is it using? I can't recall explicitly testing Llama-3.1-70B, but have done so with Llama-2-70B

SuWeipeng commented 1 day ago

@SuWeipeng can you test Llama-3.1-70B with the baseline nano_llm.chat first? How much memory is it using? I can't recall explicitly testing Llama-3.1-70B, but have done so with Llama-2-70B

@dusty-nv I'm a brand-new man, could you tell me how can I do this?