PaddlePaddle / RocketQA

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.
Apache License 2.0
767 stars 128 forks source link

ValueError: `type` to initialized an Operator can not be None. #41

Open hyunsir opened 2 years ago

hyunsir commented 2 years ago

我在试图将rocketqa配合haystack中的milvus document使用。我参考教程,在DualEncoder的模型中,对rocketqa生成的问题向量进行序列化的时候,在我的项目部署过程中产生了bug。其中,query_emb= model.encode_query(query)。 File "/*****************************************/******.py", line 177, in retrieve query_emb = np.array(list(query_emb)) File "/usr/local/lib/python3.8/site-packages/rocketqa/encoder/dual_encoder.py", line 168, in encode_query q_rep = self.exe.run(program=self.test_prog, File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1299, in run six.reraise(*sys.exc_info()) File "/usr/local/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1285, in run res = self._run_impl( File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1431, in _run_impl program = self._add_feed_fetch_ops( File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 767, in _add_feed_fetch_ops tmp_program = program.clone() File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 5443, in clone p._sync_with_cpp() File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 5983, in _sync_with_cpp block._sync_with_cpp() File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 3771, in _sync_with_cpp op = Operator(self, op_desc) File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2594, in __init__ raise ValueError( ValueError:typeto initialized an Operator can not be None. 另外这个bug只有在远程部署的时候会报错。我很疑惑,有人能给我一些帮助吗

hyunsir commented 2 years ago

上面的bug报错看不清楚可以看这个 : File "/*****/**.py", line 177, in retrieve query_emb = np.array(list(query_emb)) File "/usr/local/lib/python3.8/site-packages/rocketqa/encoder/dual_encoder.py", line 168, in encode_query q_rep = self.exe.run(program=self.test_prog, File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1299, in run six.reraise(*sys.exc_info()) File "/usr/local/lib/python3.8/site-packages/six.py", line 719, in reraise raise value File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1285, in run res = self._run_impl( File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 1431, in _run_impl program = self._add_feed_fetch_ops( File "/usr/local/lib/python3.8/site-packages/paddle/fluid/executor.py", line 767, in _add_feed_fetch_ops tmp_program = program.clone() File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 5443, in clone p._sync_with_cpp() File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 5983, in _sync_with_cpp block._sync_with_cpp() File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 3771, in _sync_with_cpp op = Operator(self, op_desc) File "/usr/local/lib/python3.8/site-packages/paddle/fluid/framework.py", line 2594, in init raise ValueError( ValueError:typeto initialized an Operator can not be None.

sfwydyc commented 2 years ago

能贴一下报错代码片段嘛?

hyunsir commented 2 years ago

这是我使用rocket服务的地方 class _RocketQADul(_BaseEmbeddingEncoder): def init(self, retriever: "RocketQARetriever"):

pretrained embedding models coming from: https://github.com/UKPLab/sentence-transformers#pretrained-models

    # e.g. 'roberta-base-nli-stsb-mean-tokens'
    self.batch_size = retriever.batch_size
    self.rocketqa_embedding_model = rocketqa.load_model(retriever.rocket_qa_embedding_model, use_cuda=False, device_id=0, batch_size=self.batch_size)
    # self.embedding_model.max_seq_length = retriever.max_seq_len
    self.show_progress_bar = retriever.progress_bar
    document_store = retriever.document_store
    if document_store.similarity != "cosine":
        logger.warning(
            f"You are using a Sentence Transformer with the {document_store.similarity} function. "
            f"We recommend using cosine instead. "
            f"This can be set when initializing the DocumentStore"
        )

def embed(self, texts: List[List[str]]) -> List[np.ndarray]:
    # texts can be a list of [title, text]
    # get back list of numpy embedding vectors
    # rocketqa change
    title = []
    para = []
    for entity in texts:
        title.append(entity[0])
        para.append(entity[1])
    emb = self.rocketqa_embedding_model.encode_para(para, title)
    emb = [r for r in emb]
    # emb = np.array(list(emb))
    return emb

def embed_queries(self, texts: List[str]) -> List[np.ndarray]:
    return self.rocketqa_embedding_model.encode_query(texts)

def embed_documents(self, docs: List[Document]) -> List[np.ndarray]:
    passages = [[d.meta["name"] if d.meta and "name" in d.meta else "", d.content] for d in docs]  # type: ignore
    # passage=[[name,content],[name2,content2],...]
    return self.embed(passages)

这是我对返回结果进行处理的然后报错的地方 def retrieve( self, query: str, filters: dict = None, top_k: Optional[int] = None, index: str = None, headers: Optional[Dict[str, str]] = None, ) -> List[Document]: if top_k is None: top_k = self.top_k if index is None: index = self.document_store.index query_emb = self.embed_queries(texts=[query]) print(query_emb) query_emb = np.array(list(list(query_emb)[0]))

print(type(query_emb)) # <class 'numpy.ndarray'>

    # print(query_emb)  # [-1.1523590,..., 1.43632686e+00]
    documents = self.document_store.query_by_embedding(
        query_emb=query_emb, filters=filters, top_k=top_k, index=index, headers=headers
    )
    return documents