Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
https://lightning.ai
Apache License 2.0
6.85k stars 726 forks source link

--checkpoint-dir 'xx' is missing the files: ['model_config.yaml'] #1352

Closed zhaosheng-thu closed 3 weeks ago

zhaosheng-thu commented 3 weeks ago

When I fine-tune Llama3 with my own dataset by:

python finetune/lora.py   --checkpoint_dir checkpoints/lit-llama3   --data JSON   --data.json_path /root/szhao/ES-Lora/litllama/ExTES/ExTES.json   --data.val_split_fraction 0.1   --out_dir out/custom-model-test

This error occured:

-checkpoint_dir '/root/szhao/ES-Lora/litgpt/litgpt/checkpoints/lit-llama3' is missing the files: ['model_config.yaml']. Find download instructions at https://github.com/Lightning-AI/litgpt/blob/main/tutorials

You have downloaded locally: --checkpoint_dir '/root/szhao/ES-Lora/litgpt/litgpt/checkpoints/lit-llama3/tokenizer.model' --checkpoint_dir '/root/szhao/ES-Lora/litgpt/litgpt/checkpoints/lit-llama3/lit_model.pth' --checkpoint_dir '/root/szhao/ES-Lora/litgpt/litgpt/checkpoints/lit-llama3/tokenizer_config.json' --checkpoint_dir '/root/szhao/ES-Lora/litgpt/litgpt/checkpoints/lit-llama3/config.json' --checkpoint_dir '/root/szhao/ES-Lora/litgpt/litgpt/checkpoints/lit-llama3/generation_config.json'

See all download options by running: litgpt download

But I can't find the file model_config.yaml from Meta-Llama-3-8B. When I fine-tuned Llama2 with the repository lit-llama before, all I needed seemed to be the lit_model.pth and tokenizer.model files. What should I do? Thanks.

My conda env:

click here # # Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu absl-py 2.1.0 pypi_0 pypi accelerate 0.29.3 pypi_0 pypi aiohttp 3.9.5 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi annotated-types 0.6.0 pypi_0 pypi anyio 4.3.0 pypi_0 pypi async-timeout 4.0.3 pypi_0 pypi attrs 23.2.0 pypi_0 pypi awscrt 0.20.9 pypi_0 pypi bitsandbytes 0.42.0 pypi_0 pypi boto3 1.34.90 pypi_0 pypi botocore 1.34.90 pypi_0 pypi bzip2 1.0.8 h5eee18b_5 ca-certificates 2024.3.11 h06a4308_0 certifi 2024.2.2 pypi_0 pypi chardet 5.2.0 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi click 8.1.7 pypi_0 pypi colorama 0.4.6 pypi_0 pypi dataproperty 1.0.1 pypi_0 pypi datasets 2.19.0 pypi_0 pypi dill 0.3.8 pypi_0 pypi docstring-parser 0.16 pypi_0 pypi evaluate 0.4.1 pypi_0 pypi exceptiongroup 1.2.1 pypi_0 pypi fastapi 0.110.2 pypi_0 pypi filelock 3.13.4 pypi_0 pypi frozenlist 1.4.1 pypi_0 pypi fsspec 2024.3.1 pypi_0 pypi grpcio 1.62.2 pypi_0 pypi h11 0.14.0 pypi_0 pypi hf-transfer 0.1.6 pypi_0 pypi httpcore 1.0.5 pypi_0 pypi httpx 0.27.0 pypi_0 pypi huggingface-hub 0.22.2 pypi_0 pypi idna 3.7 pypi_0 pypi importlib-resources 6.4.0 pypi_0 pypi jinja2 3.1.3 pypi_0 pypi jmespath 1.0.1 pypi_0 pypi joblib 1.4.0 pypi_0 pypi jsonargparse 4.28.0 pypi_0 pypi jsonlines 4.0.0 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libffi 3.4.4 h6a678d5_0 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 11.2.0 h1234567_1 libuuid 1.41.5 h5eee18b_0 lightning 2.3.0.dev20240328 pypi_0 pypi lightning-utilities 0.11.2 pypi_0 pypi litdata 0.2.5 pypi_0 pypi litgpt 0.4.0.dev0 pypi_0 pypi litserve 0.1.0 pypi_0 pypi lm-eval 0.4.2 pypi_0 pypi lxml 5.2.1 pypi_0 pypi markdown 3.6 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi mbstrdecoder 1.1.3 pypi_0 pypi more-itertools 10.2.0 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi multidict 6.0.5 pypi_0 pypi multiprocess 0.70.16 pypi_0 pypi ncurses 6.4 h6a678d5_0 networkx 3.3 pypi_0 pypi nltk 3.8.1 pypi_0 pypi numexpr 2.10.0 pypi_0 pypi numpy 1.26.4 pypi_0 pypi nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi nvidia-curand-cu12 10.3.2.106 pypi_0 pypi nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi nvidia-nccl-cu12 2.20.5 pypi_0 pypi nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi nvidia-nvtx-cu12 12.1.105 pypi_0 pypi openssl 3.0.13 h7f8727e_0 packaging 24.0 pypi_0 pypi pandas 2.2.2 pypi_0 pypi pathvalidate 3.2.0 pypi_0 pypi peft 0.10.0 pypi_0 pypi pip 23.3.1 py310h06a4308_0 portalocker 2.8.2 pypi_0 pypi protobuf 5.26.1 pypi_0 pypi psutil 5.9.8 pypi_0 pypi pyarrow 16.0.0 pypi_0 pypi pyarrow-hotfix 0.6 pypi_0 pypi pybind11 2.12.0 pypi_0 pypi pydantic 2.7.1 pypi_0 pypi pydantic-core 2.18.2 pypi_0 pypi pytablewriter 1.2.0 pypi_0 pypi python 3.10.14 h955ad1f_0 python-dateutil 2.9.0.post0 pypi_0 pypi pytorch-lightning 2.2.3 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi readline 8.2 h5eee18b_0 regex 2024.4.16 pypi_0 pypi requests 2.31.0 pypi_0 pypi responses 0.18.0 pypi_0 pypi rouge-score 0.1.2 pypi_0 pypi s3transfer 0.10.1 pypi_0 pypi sacrebleu 2.4.2 pypi_0 pypi safetensors 0.4.3 pypi_0 pypi scikit-learn 1.4.2 pypi_0 pypi scipy 1.13.0 pypi_0 pypi sentencepiece 0.2.0 pypi_0 pypi setuptools 68.2.2 py310h06a4308_0 six 1.16.0 pypi_0 pypi sniffio 1.3.1 pypi_0 pypi sqlite 3.41.2 h5eee18b_0 sqlitedict 2.1.0 pypi_0 pypi starlette 0.37.2 pypi_0 pypi sympy 1.12 pypi_0 pypi tabledata 1.3.3 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tcolorpy 0.1.4 pypi_0 pypi tensorboard 2.16.2 pypi_0 pypi tensorboard-data-server 0.7.2 pypi_0 pypi threadpoolctl 3.4.0 pypi_0 pypi tk 8.6.12 h1ccaba5_0 tokenizers 0.19.1 pypi_0 pypi torch 2.3.0 pypi_0 pypi torchmetrics 1.3.2 pypi_0 pypi tqdm 4.66.2 pypi_0 pypi tqdm-multiprocess 0.0.11 pypi_0 pypi transformers 4.40.1 pypi_0 pypi triton 2.3.0 pypi_0 pypi typepy 1.3.2 pypi_0 pypi typeshed-client 2.5.1 pypi_0 pypi typing-extensions 4.11.0 pypi_0 pypi tzdata 2024.1 pypi_0 pypi urllib3 2.2.1 pypi_0 pypi uvicorn 0.29.0 pypi_0 pypi werkzeug 3.0.2 pypi_0 pypi wheel 0.41.2 py310h06a4308_0 word2number 1.1 pypi_0 pypi xxhash 3.4.1 pypi_0 pypi xz 5.4.6 h5eee18b_0 yarl 1.9.4 pypi_0 pypi zlib 1.2.13 h5eee18b_0 zstandard 0.22.0 pypi_0 pypi
carmocca commented 3 weeks ago

Did you run litgpt download --repo_id meta-llama/Meta-Llama-3-8B first?

After doing that, you would pass --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B

zhaosheng-thu commented 3 weeks ago

Did you run litgpt download --repo_id meta-llama/Meta-Llama-3-8B first?

After doing that, you would pass --checkpoint_dir checkpoints/meta-llama/Meta-Llama-3-8B

Thanks for your help! Everything is working smoothly now. (Initially, I just used the script convert_hf_checkpoint.py from the lit-llama repository, which caused the issue.