Question about training detail

zhiyuanc2001 commented 5 months ago

Hi, thanks for your good job! I have some doubts about the training labels. The preprocess function in toolbench/train/train.py just copies the input_ids as the target and then masks the target. However, it seems that there is no position shift operation for target as the LLM should perform next token prediction.

    # Tokenize conversations
    input_ids = tokenizer(
        conversations,
        return_tensors="pt",
        padding="max_length",
        max_length=tokenizer.model_max_length,
        truncation=True,
    ).input_ids
    targets = input_ids.clone()

Can you help me explain the reasons or there are any details I might have overlooked? Thank you very much.

caixd-220529 commented 4 months ago

I have the same confuse. Did you solve it?

caixd-220529 commented 4 months ago

查阅了transformers的源代码，似乎框架会为我执行shift的操作

OpenBMB / ToolBench

Question about training detail #260