OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
https://openbmb.github.io/ToolBench/
Apache License 2.0
4.76k stars 402 forks source link

Question about training detail #260

Closed zhiyuanc2001 closed 4 months ago

zhiyuanc2001 commented 5 months ago

Hi, thanks for your good job! I have some doubts about the training labels. The preprocess function in toolbench/train/train.py just copies the input_ids as the target and then masks the target. However, it seems that there is no position shift operation for target as the LLM should perform next token prediction.

    # Tokenize conversations
    input_ids = tokenizer(
        conversations,
        return_tensors="pt",
        padding="max_length",
        max_length=tokenizer.model_max_length,
        truncation=True,
    ).input_ids
    targets = input_ids.clone()

Can you help me explain the reasons or there are any details I might have overlooked? Thank you very much.

caixd-220529 commented 4 months ago

I have the same confuse. Did you solve it?

caixd-220529 commented 4 months ago

查阅了transformers的源代码,似乎框架会为我执行shift的操作