Something strange with Instruct model tokenization

🐛 Describe the bug

Here is the code I am running. The goal is to get logprob for each token generated by the chat model.

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-7B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-7B-Instruct")
prompt = tokenizer.apply_chat_template(chat, tokenize=False,
                                       add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True,
                          return_tensors="pt").to(device)
output = olmo.generate(input_ids=inputs.to(olmo.device),
                       max_new_tokens=10,
                       do_sample=True,
                       top_k=50,
                       top_p=0.95,
                       return_dict_in_generate=True,
                       output_scores=True)
transition_scores = olmo.compute_transition_scores(
            output.sequences, output.scores, normalize_logits=True)

Here is the error when I run the code above.

Traceback (most recent call last):
  File "/n/holylabs/LABS/doshi-velez_lab/Users/skrishna/w2s/self_loop_llm/src/olma.py", line 307, in <module>
    api_loop_call(args, start_prompts, prefix_prompts[args.data_name][args.prefix],  self_correct_prompt, get_test_data(args.data_name, dataset), few_shot_prompt)
  File "/n/holylabs/LABS/doshi-velez_lab/Users/skrishna/w2s/self_loop_llm/src/olma.py", line 181, in api_loop_call
    response = get_llm_prediction_with_logits(prompt, temperature = args.temperature, large_model=args.llm)
  File "/n/holylabs/LABS/doshi-velez_lab/Users/skrishna/w2s/self_loop_llm/src/olma.py", line 88, in get_llm_prediction_with_logits
    transition_scores = olmo.compute_transition_scores(
  File "/n/home02/skrishna/.conda/envs/pt2.1.0_cuda12.1/lib/python3.10/site-packages/transformers/generation/utils.py", line 1235, in compute_transition_scores
    scores = scores.reshape(-1, self.config.vocab_size, scores.shape[-1])
RuntimeError: shape '[-1, 50280, 10]' is invalid for input of size 503040

Here is where the weird part : the size of the output.scores[0] should be [1, vocab_size] where for olmo vocab_size = 50280 but the size of output.scores[0] = [1, 50304] . How come the outcome is not aligned with the vocab_size. Also the value of outcome.scores is mostly -infs.

Versions

Python 3.10.13

allenai / open-instruct

Something strange with Instruct model tokenization #132

🐛 Describe the bug

Versions