InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.47k stars 153 forks source link

piece id is out of range #234

Open tianrengao opened 6 months ago

tianrengao commented 6 months ago

Hitting this issue when decoding: any thoughts?


  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/run_captioning.py", line 102, in gen_json
    captions = eval_model(args, model_name, tokenizer, model, image_processor, image_batch, qs=prompt, device="cuda")
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/run_captioning.py", line 93, in eval_model
    outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 348
5, in batch_decode
    return [
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 348
6, in <listcomp>
    self.decode(
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 352
5, in decode
    return self._decode(
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 931, in
_decode
    filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 912, in
convert_ids_to_tokens
    tokens.append(self._convert_id_to_token(index))
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py",
line 204, in _convert_id_to_token
    token = self.sp_model.IdToPiece(index)
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1045, in _batched
_func
    return _func(self, arg)
  File "/home/ubuntu/wbc/captioning/InternLM-XComposer/projects/ShareGPT4V/caption/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
    raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.
CAOANJIA commented 4 months ago

I encountered the same problem, did you solve it?