Closed deepLwcg closed 6 months ago
请提出你的问题
系统环境:
- python 3.9.16
- paddlenlp 2.5.2
- paddle-pipelines 0.5.0
- paddlepaddle-gpu 2.4.2.post117
- CUDA Version 11.7
- NVIDIA Driver Version 515.43.04
- Ubuntu 22.04 LTS
按照步骤到3.4.2 文档数据写入 ANN 索引库发生问题
#以DuReader-Robust 数据集为例建立 ANN 索引库 python utils/offline_ann.py --index_name dureader_robust_query_encoder \ --doc_dir data/dureader_dev \ --search_engine elastic \ --embed_title True \ --delete_index
[2023-03-09 12:38:24,769] [ INFO] - Special tokens file saved in /home/nullht/.paddlenlp/models/rocketqa-zh-nano-para-encoder/special_tokens_map.json INFO - pipelines.utils.logger - Logged parameters: {'processor': 'TextSimilarityProcessor', 'tokenizer': 'NoneType', 'max_seq_len': '0', 'dev_split': '0.1'} INFO - pipelines.document_stores.elasticsearch - Updating embeddings for all 1398 docs ... Updating embeddings: 0%| | 0/1398 [00:00<?, ? Docs/sException in thread Thread-3: | 0/1408 [00:00<?, ? Docs/s] TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/threading.py", line 917, in run self._target(*self._args, self._kwargs) File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 217, in _thread_loop batch = self._dataset_fetcher.fetch(indices, File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/paddle/fluid/dataloader/fetcher.py", line 138, in fetch data = self.collate_fn(data) File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/pipelines/nodes/retriever/dense.py", line 280, in token_padding_inputs input_ids = Pad(axis=0, pad_val=self.passage_tokenizer.pad_token_id, dtype="int64")(input_ids) File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/paddlenlp/data/collate.py", line 150, in call** ret[i] = arr ValueError: setting an array element with a sequence.
请安装最新develop版本,或者把embedding_title参数设置为False
请提出你的问题
系统环境:
按照步骤到3.4.2 文档数据写入 ANN 索引库发生问题
[2023-03-09 12:38:24,769] [ INFO] - Special tokens file saved in /home/nullht/.paddlenlp/models/rocketqa-zh-nano-para-encoder/special_tokens_map.json INFO - pipelines.utils.logger - Logged parameters: {'processor': 'TextSimilarityProcessor', 'tokenizer': 'NoneType', 'max_seq_len': '0', 'dev_split': '0.1'} INFO - pipelines.document_stores.elasticsearch - Updating embeddings for all 1398 docs ... Updating embeddings: 0%| | 0/1398 [00:00<?, ? Docs/sException in thread Thread-3: | 0/1408 [00:00<?, ? Docs/s] TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 217, in _thread_loop batch = self._dataset_fetcher.fetch(indices, File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/paddle/fluid/dataloader/fetcher.py", line 138, in fetch data = self.collate_fn(data) File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/pipelines/nodes/retriever/dense.py", line 280, in token_padding_inputs input_ids = Pad(axis=0, pad_val=self.passage_tokenizer.pad_token_id, dtype="int64")(input_ids) File "/home/nullht/anaconda3/envs/nlp/lib/python3.9/site-packages/paddlenlp/data/collate.py", line 150, in call ret[i] = arr ValueError: setting an array element with a sequence.