Llama factory - Githubissues

yuyoujiang commented 1 week ago

Hi Dusty,

We can use the GPU on Jetson to fine-tune large models. I have successfully loaded and fine-tuned the Phi-1.5-1.3B model using llama-factory on the Orin NX 16GB, and the results are very good.

According to the estimates from llama-factory, we can even fine-tune a 70B model on the AGX Orin 64GB device.

dusty-nv commented 1 week ago

This is cool @yuyoujiang , thanks. Can you try dustynv/llama-factory:r36.3.0 ? I added flash-attention to it. I saw the web UI but did not run a training.

yuyoujiang commented 6 days ago

Thank you for your interest in this container. I tested the container you provided and it does have a problem. There seem to be two crises:

Can not access huggingface.co. This problem may be due to the Python version (The default python version of Jetpack6 is 3.10), I refer here to reduce requests==2.27.1 after the training program can be normal operation.
Do not have permission to access the protected model. If we need to download the model like llama3 from huggingface, we need to do some extra work.
- Request access to the llama3 at huggingface .
- Configure the huggingface token by huggingface-cli login in the jetson. I have tried to configure global variables when starting the container, but doing so does not work. (--env HUGGINGFACE_TOKEN=<YOUR-ACCESS-TOKEN>)

I use the following steps to start the training program:

bash sudo docker run -it --network host --runtine nvidia -v /home/seeed/jetson-containers/data:/data dustynv/llama-factory:r36.3.0
Open another terminal, enter the container above (docker exec), and execute: bash pip3 install requests==2.27.1
Set Model name: Phi-1.5-1.3B, Dataset: alpaca_zh and other parameters remain default
Click the Start button to start the training program

Screenshot from 2024-07-01 18-27-07

dusty-nv / jetson-containers

Llama factory #566