PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.99k stars 2.93k forks source link

[Bug]: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token #8603

Closed sanbuphy closed 3 weeks ago

sanbuphy commented 3 months ago

软件环境

- paddlepaddle:develop
- paddlepaddle-gpu: develop 11.8
- paddlenlp:  lastest  4609d07a54ab97974b962b536dde7164ab15db93

重复问题

错误描述

meta-llama/Meta-Llama-3-8B-Instruct infer error

(…)nstruct/model-00004-of-00004.safetensors:  94%|▉| 1.10G/1.17G [00:13<00:00, 8
(…)nstruct/model-00004-of-00004.safetensors:  95%|▉| 1.11G/1.17G [00:13<00:00, 7
(…)nstruct/model-00004-of-00004.safetensors:  96%|▉| 1.12G/1.17G [00:13<00:00, 8
(…)nstruct/model-00004-of-00004.safetensors:  97%|▉| 1.13G/1.17G [00:13<00:00, 7
(…)nstruct/model-00004-of-00004.safetensors:  98%|▉| 1.14G/1.17G [00:14<00:00, 6
(…)nstruct/model-00004-of-00004.safetensors: 100%|█| 1.17G/1.17G [00:14<00:00, 8
Downloading shards: 100%|█████████████████████████| 4/4 [03:24<00:00, 51.14s/it]
W0613 23:29:27.245162 141364 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0613 23:29:27.246907 141364 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
Loading checkpoint shards: 100%|██████████████████| 4/4 [03:39<00:00, 54.87s/it]
[2024-06-13 23:33:27,358] [    INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.

[2024-06-13 23:33:27,359] [    INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
(…)ama-3-8B-Instruct/generation_config.json: 100%|█| 126/126 [00:00<00:00, 489kB
[2024-06-13 23:33:27,486] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/meta-llama/Meta-Llama-3-8B-Instruct/generation_config.json
[2024-06-13 23:33:27,487] [    INFO] - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load 'meta-llama/Meta-Llama-3-8B-Instruct'.
[2024-06-13 23:33:27,487] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/meta-llama/Meta-Llama-3-8B-Instruct/config.json
[2024-06-13 23:33:27,488] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/meta-llama/Meta-Llama-3-8B-Instruct/generation_config.json
[2024-06-13 23:33:27,490] [    INFO] - Start predict
[2024-06-13 23:33:27,491] [   ERROR] - Using pad_token, but it is not set yet.
Traceback (most recent call last):
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1651, in <module>
    predict()
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1596, in predict
    outputs = predictor.predict(batch_source_text)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 251, in predict
    tokenized_source = self._preprocess(input_texts)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 226, in _preprocess
    tokenized_source = self.tokenizer(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2248, in __call__
    return self.batch_encode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2523, in batch_encode
    padding_strategy, truncation_strategy, max_length, kwargs = self._get_padding_truncation_strategies(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 2004, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

### 稳定复现步骤 & 代码

!pip install tiktoken !python predictor.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --dtype=float16

sanbuphy commented 3 months ago

Using unk_token, but it is not set yet. same question in qwen2 inference:

[33m[2024-06-13 23:21:55,506] [ WARNING] - if you run ring_flash_attention.py, please ensure you install the paddlenlp_ops by following the instructions provided at https://github.com/PaddlePaddle/PaddleNLP/blob/develop/csrc/README.md
[2024-06-13 23:21:56,948] [    INFO] - We are using <class 'paddlenlp.transformers.qwen2.tokenizer.Qwen2Tokenizer'> to load 'Qwen/Qwen2-7B-Instruct'.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,310] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:21:57,311] [    INFO] - We are using <class 'paddlenlp.transformers.qwen2.configuration.Qwen2Config'> to load 'Qwen/Qwen2-7B-Instruct'.
[2024-06-13 23:21:57,311] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/config.json
[2024-06-13 23:21:57,312] [    INFO] - We are using <class 'paddlenlp.transformers.qwen2.modeling.Qwen2ForCausalLM'> to load 'Qwen/Qwen2-7B-Instruct'.
[2024-06-13 23:21:57,312] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/config.json
[2024-06-13 23:21:57,313] [    INFO] - Loading weights file from cache at /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/model.safetensors.index.json
Downloading shards: 100%|██████████████████████| 4/4 [00:00<00:00, 26255.42it/s]
W0613 23:21:57.318940 134458 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0613 23:21:57.320240 134458 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
Loading checkpoint shards: 100%|██████████████████| 4/4 [03:18<00:00, 49.74s/it]
[2024-06-13 23:25:34,697] [    INFO] - All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[2024-06-13 23:25:34,697] [    INFO] - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.
[2024-06-13 23:25:34,700] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/generation_config.json
[2024-06-13 23:25:34,700] [    INFO] - Generation config file not found, using a generation config created from the model config.
[2024-06-13 23:25:34,701] [    INFO] - We are using <class 'paddlenlp.transformers.qwen2.configuration.Qwen2Config'> to load 'Qwen/Qwen2-7B-Instruct'.
[2024-06-13 23:25:34,701] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/config.json
[2024-06-13 23:25:34,701] [    INFO] - Loading configuration file /home/aistudio/.paddlenlp/models/Qwen/Qwen2-7B-Instruct/generation_config.json
[2024-06-13 23:25:34,702] [ WARNING] - Can't find generation config, so it will not use generation_config field in the model config
[2024-06-13 23:25:34,703] [    INFO] - Start predict
[2024-06-13 23:25:48,343] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,343] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,343] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,343] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,343] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,343] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,343] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,344] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,345] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,346] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,347] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,348] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,349] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,350] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,351] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,352] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,353] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,354] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,355] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,356] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,357] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,357] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,357] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,357] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,357] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,357] [   ERROR] - Using unk_token, but it is not set yet.
[2024-06-13 23:25:48,357] [   ERROR] - Using unk_token, but it is not set yet.
Traceback (most recent call last):
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1651, in <module>
    predict()
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 1596, in predict
    outputs = predictor.predict(batch_source_text)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 253, in predict
    decoded_predictions = self._postprocess(predictions)
  File "/home/aistudio/work/PaddleNLP/llm/predictor.py", line 245, in _postprocess
    decoded_predictions = self.tokenizer.batch_decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 3200, in batch_decode
    return [
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 3201, in <listcomp>
    self.decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils_base.py", line 3239, in decode
    return self._decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/qwen2/tokenizer.py", line 294, in _decode
    return super()._decode(
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/tokenizer_utils.py", line 1842, in _decode
    sub_texts.append(self.convert_tokens_to_string(current_sub_text))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddlenlp/transformers/qwen2/tokenizer.py", line 280, in convert_tokens_to_string
    text = "".join(tokens)
TypeError: sequence item 196: expected str instance, NoneType found
DrownFish19 commented 3 months ago

Qwen修复PR:https://github.com/PaddlePaddle/PaddleNLP/pull/8601 LLaMA修复PR:https://github.com/PaddlePaddle/PaddleNLP/pull/8630

yuanlehome commented 3 months ago

Qwen修复PR:#8601 LLaMA修复PR:#8630

https://github.com/PaddlePaddle/PaddleNLP/pull/8630 并没有修复LLaMA的问题哈,我不知道咋修。

DrownFish19 commented 3 months ago

Qwen修复PR:#8601 LLaMA修复PR:#8630

8630 并没有修复LLaMA的问题哈,我不知道咋修。

LLama3的tokenizer中缺少pad_token,修复代码如下,我看已经加上了

if (isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:
        tokenizer.pad_token = tokenizer.unk_token
yuanlehome commented 3 months ago

Qwen修复PR:#8601 LLaMA修复PR:#8630

8630 并没有修复LLaMA的问题哈,我不知道咋修。

LLama3的tokenizer中缺少pad_token,修复代码如下,我看已经加上了

if (isinstance(tokenizer, LlamaTokenizer) or isinstance(tokenizer, Llama3Tokenizer)) and not tokenizer.pad_token:
        tokenizer.pad_token = tokenizer.unk_token

没用的,llama3连unk_token都set不了,这里的代码改动可以先忽略。我们根本上需要同时解决pad_token和unk_token的set问题才行,我猜测,其他token可能也同样无法set。

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 3 weeks ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。