RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.
https://rlhflow.github.io/
Apache License 2.0
612 stars 52 forks source link

"Token pattern not found in the list" error #24

Open nshen7 opened 1 month ago

nshen7 commented 1 month ago

Hi there,

I got this "Token pattern not found in the list" error when I tried out the model under no_grad() condition. Would you take a look at this please? Many thanks!! See below for the code and error message:

input_ids_a, input_ids_b, labels = next(TRAIN_DATASET)

with torch.no_grad():
    outputs_a = model(input_ids=input_ids_a)
    outputs_b = model(input_ids=input_ids_b)
ValueError Traceback (most recent call last)
File :3

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /usr/local/lib/python3.10/site-packages/peft/peft_model.py:1238, in PeftModelForSequenceClassification.forward(self, input_ids, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, **kwargs)
1236 if peft_config.peft_type == PeftType.POLY:
1237 kwargs["task_ids"] = task_ids
-> 1238 return self.base_model(
1239 input_ids=input_ids,
1240 attention_mask=attention_mask,
1241 inputs_embeds=inputs_embeds,
1242 labels=labels,
1243 output_attentions=output_attentions,
1244 output_hidden_states=output_hidden_states,
1245 return_dict=return_dict,
1246 **kwargs,
1247 )
1249 batch_size = _get_batch_size(input_ids, inputs_embeds)
1250 if attention_mask is not None:
1251 # concat prompt attention mask

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, **kwargs)
1530 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1531 else:
-> 1532 return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, **kwargs)
1536 # If we don't have any hooks, we want to skip the rest of the logic in
1537 # this function, and just call forward.
1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1539 or _global_backward_pre_hooks or _global_backward_hooks
1540 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1541 return forward_call(*args, **kwargs)
1543 try:
1544 result = None

File /usr/local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:179, in BaseTuner.forward(self, *args, **kwargs)
178 def forward(self, *args: Any, **kwargs: Any):
--> 179 return self.model.forward(*args, **kwargs)

File ~/.cache/huggingface/modules/transformers_modules/RLHFlow/ArmoRM-Llama3-8B-v0.1/97bc38d5bc709b850e236ef5f03589f6098552c0/modeling_custom.py:152, in LlamaForRewardModelWithGating.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
149 assert hidden_states.shape == (batch_size, self.config.hidden_size)
150 rewards = self.regression_layer(hidden_states)
--> 152 gating_token_positions = [find_token_for_gating(ids.tolist()) for ids in input_ids]
153 prompt_embedding = tokens_hidden_states[dummy_iterator, gating_token_positions, :]
154 gating_output = self.gating(prompt_embedding)

File ~/.cache/huggingface/modules/transformers_modules/RLHFlow/ArmoRM-Llama3-8B-v0.1/97bc38d5bc709b850e236ef5f03589f6098552c0/modeling_custom.py:152, in (.0)
149 assert hidden_states.shape == (batch_size, self.config.hidden_size)
150 rewards = self.regression_layer(hidden_states)
--> 152 gating_token_positions = [find_token_for_gating(ids.tolist()) for ids in input_ids]
153 prompt_embedding = tokens_hidden_states[dummy_iterator, gating_token_positions, :]
154 gating_output = self.gating(prompt_embedding)

File ~/.cache/huggingface/modules/transformers_modules/RLHFlow/ArmoRM-Llama3-8B-v0.1/97bc38d5bc709b850e236ef5f03589f6098552c0/modeling_custom.py:47, in find_token_for_gating(lst)
45 if lst[j:j + token_pattern_len] == token_pattern:
46 return j
---> 47 raise ValueError("Token pattern not found in the list.")

ValueError: Token pattern not found in the list.
Haoxiang-Wang commented 1 month ago

Have you updated transformers to the newest version? @nshen7

nshen7 commented 1 month ago

I was using Version 4.41.2, but the error persists with the newest version.

To add to the context, I was using the default tokenizer from RLHFlow/ArmoRM-Llama3-8B-v0.1

tokenizer = AutoTokenizer.from_pretrained(CFG.MODEL_NAME, trust_remote_code=CFG.TRUEST_REMOTE)
INPUT_IDS_A = tokenizer.apply_chat_template(
    train['message_a'].tolist(), 
    tokenize=True,
    padding=True, 
    truncation=True, 
    max_length=CFG.MAX_LENGTH, 
    return_tensors='np'
)
INPUT_IDS_B = tokenizer.apply_chat_template(
    train['message_b'].tolist(), 
    tokenize=True,
    padding=True, 
    truncation=True, 
    max_length=CFG.MAX_LENGTH, 
    return_tensors='np'
)

And I was trying to fine-tune the gating layer of the model with LoRA:

lora_config = LoraConfig(
    r=CFG.LORA_RANK,  # the dimension of the low-rank matrices
    lora_alpha = CFG.LORA_ALPHA, # scaling factor for LoRA activations vs pre-trained weight activations
    lora_dropout= CFG.DROPOUT, 
    bias='none',
    inference_mode=False,
    task_type=TaskType.SEQ_CLS,
    target_modules=CFG.LORA_MODULES ) # Only Use Output and Values Projection
model = get_peft_model(base_model, lora_config)

Configs:

class CFG:
    NUM_EPOCHS = 1
    BATCH_SIZE = 16
    DROPOUT = 0.05 
    MODEL_NAME = "RLHFlow/ArmoRM-Llama3-8B-v0.1"
    TRUEST_REMOTE = True
    SEED = 2024 
    MAX_LENGTH = 1024 
    NUM_WARMUP_STEPS = 128
    LR_MAX = 5e-5 
    NUM_LABELS = 3 
    GATING_TEMP = 10
    LORA_RANK = 4
    LORA_ALPHA = 8
    LORA_MODULES = ["gating.layers.0", "gating.layers.1", "gating.layers.2", "gating.layers.3"]
glorgao commented 1 month ago

Same error.

I believe the reason is about the function defined in the modeling_custom.py, find_token_for_gating(), which caputures the token partten: <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n

The deeper reason lies on the dataset side. I find the code always falls on a sample with the "response" are null. For example the sample with prompt_id c0dfe114bb80a25990c193539a0e8e43557ba7236fd00e71731b852b4e7849a9 in the UltraFeedback_binarized dataset: https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized/viewer/default/train_prefs?q=Maybe+using+FILTER+or+VLOOKUP&row=24707

I believe the solution lies on removing these bad samples.