dusty-nv / jetson-containers

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
MIT License
1.9k stars 416 forks source link

Llama factory #566

Closed yuyoujiang closed 1 week ago

yuyoujiang commented 1 week ago

Hi Dusty,

We can use the GPU on Jetson to fine-tune large models. I have successfully loaded and fine-tuned the Phi-1.5-1.3B model using llama-factory on the Orin NX 16GB, and the results are very good. image

According to the estimates from llama-factory, we can even fine-tune a 70B model on the AGX Orin 64GB device.

dusty-nv commented 1 week ago

This is cool @yuyoujiang , thanks. Can you try dustynv/llama-factory:r36.3.0 ? I added flash-attention to it. I saw the web UI but did not run a training.

yuyoujiang commented 6 days ago

Thank you for your interest in this container. I tested the container you provided and it does have a problem. There seem to be two crises:

  1. Can not access huggingface.co. This problem may be due to the Python version (The default python version of Jetpack6 is 3.10), I refer here to reduce requests==2.27.1 after the training program can be normal operation.

  2. Do not have permission to access the protected model. If we need to download the model like llama3 from huggingface, we need to do some extra work.

    • Request access to the llama3 at huggingface .
    • Configure the huggingface token by huggingface-cli login in the jetson. I have tried to configure global variables when starting the container, but doing so does not work. (--env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN>)

I use the following steps to start the training program:

Screenshot from 2024-07-01 18-27-07