Open ohhan777 opened 2 months ago
@ohhan777 Can u send me the logs after commenting it out?
@federico1-creator This is not solved even after commenting that line. Can u look into it
Hi everyone, thank you for your interest in our project !!!
We have conduct some tests to better understand the differences in behavior between the code we're running and the tokenization mismatch issue you mentioned. The problem is the llama 3.1 tokenizer, which was updated by the Meta team. This update create a mismatch between the version we used during development and the one you are currently using.
To fix this issue you can use our tokenizer
, which is included in the LLaVA-MORE weights.
Specifically, I have already updated the training scripts to use the new TOKENIZER_PATH
.
https://github.com/aimagelab/LLaVA-MORE/blob/main/scripts/more/11_pretrain_llama_31_acc_st_1.sh https://github.com/aimagelab/LLaVA-MORE/blob/main/scripts/more/12_finetuning_llama_31_acc_st_1.sh
@ohhan777 @sahilqure @sahil02235
@federico1-creator Thanks for this will check it.
If I train everything from the scratch, could I get this error too?
Thank you for sharing the great source code. I have been trying to pretrain and fine-tune with LLaMA 3.1. While the pretraining works fine, I noticed that the following warnings occur during the fine-tuning process, preventing the model from training properly:
After checking the source code, I found that in the
train.py
file, within thepreprocess_llama_3_1()
function, thecur_len
value becomes 4 more than it should be due to the following line of code:As a result, all targets are treated as
IGNORE_INDEX
, and the model does not train. When I commented out this line, the issue seemed to disappear, and the training worked properly. Was this line intentionally included?