bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.28k stars 518 forks source link

some concern in "mask_user_labels"? #129

Open DoffeBupt opened 1 year ago

DoffeBupt commented 1 year ago
  1. file chat/dialogues.py:239 should while labels[current_idx] != assistant_token_id and current_idx < len(labels): be while current_idx < len(labels) and labels[current_idx] != assistant_token_id: ?

  2. chat/train.py:204 should mask_user_labels(tokenizer, dialoguetemplate, labels) be: for in labels: mask_user_labels(tokenizer, dialoguetemplate, )

DoffeBupt commented 1 year ago

otherwise seems the mask_user_labels has bugs itself(1) and can not be used correctly(2)?