chaoyi-wu / Finetune_LLAMA

简单易懂的LLaMA微调指南。
354 stars 34 forks source link

convert_to_ds_params.py doesn't generate tokenizer #4

Open tammypi opened 1 year ago

tammypi commented 1 year ago

convert_to_ds_params.py only generates llama-7b folder and .pt files in it. But does not generate tokenizer. But the param tokenizer_path of tokenize_dataset.py needs tokenizer. So how can I get tokenizer?

chaoyi-wu commented 1 year ago

You can download tokenizer from here. Besides, it also provides the model files after operating convert_to_ds_params.py.

jingyeyang95 commented 1 year ago

I had a similar issue as @tammypi when I tried to run finetune_pp_peft.py. The script only generates .pt files (e.g. layer_00-model_states.pt). Therefore, when I run python finetune_pp_peft.py --model_path ../llama-7b/, it said no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory ../llama-7b/.

Alternatively, I could use src/transformers/models/llama/convert_llama_weights_to_hf.py to convert the model into hf format and run finetune_pp_peft.py without any problem. Do you think it's a good idea to use convert_llama_weights_to_hf.py in transformers package instead of your file? What is the difference? Thanks!

chaoyi-wu commented 1 year ago

Sorry for the mistake. I actually hope to mention convert_llama_weights_to_hf.py in this project but add convert_to_ds_params.py incorrectly. Thanks for your issue, I have fixed this bug.