Open Richar-Du opened 1 year ago
@Richar-Du Thanks a lot for the feedback! I do think you are correct. I remember I set the 1 for some reason (should be some annoying tokenizer mismatch problem).. It wasn't giving a trouble when training the first version of longchat, so I leave it here..
You are very right, actually I am meeting some potential bug in data processing in upgrading longchat with llama-2.. Actually need to debug more.. Let me know if you find the correct way to preprocess it.
@DachengLi1 Thanks for your reply and I'm very glad to give some feedback:) Honestly speaking, I am still trying to debug it because I'm not quite familiar with it. I add target[:cur_len] = IGNORE_TOKEN_ID
to change 1 to -100 but the training result is still abnormal. I am going to compare the code of fastchat and longchat and try to solve it.
@DachengLi1 I find that the only difference between fastchat and longchat is that longchat use RoPE, so if I add replace_llama_with_condense(ratio=8)
to the train_mem.py in fastchat, fastchat will be the same as longchat?
As far as I tried now, add replace_llama_with_condense(ratio=8)
could make fastchat support long-context fine-tuning.
Thanks for your awesome work so that the community can train LLM on very long context! However, I find that in the
preprocess
function, line https://github.com/DachengLi1/LongChat/blob/a824bda25c0082e60973c35c79b0f35d69c6be2d/longchat/train/fine_tune/train.py#L125 and line: https://github.com/DachengLi1/LongChat/blob/a824bda25c0082e60973c35c79b0f35d69c6be2d/longchat/train/fine_tune/train.py#L137 will settarget
to:[1, -100, -100, ...]
, with the first element is not ignored. I think Fastchat gives the correct code, which is first settingtarget[:cur_len] = IGNORE_TOKEN_ID
so the target will be[-100, -100, -100, ...]
. Am I right? @DachengLi1