OSU-NLP-Group / HippoRAG

[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personalized PageRank.
https://arxiv.org/abs/2405.14831
MIT License
1.41k stars 117 forks source link

Query NER exception #3

Closed bonadio closed 3 months ago

bonadio commented 5 months ago

Hi

First congratulations for the team, the results seems very promising.

I don't have a local GPU so I created this Colab Notebook to run the code.

https://colab.research.google.com/drive/1Su1nnxed7r1UUJIa-sSstVLz2YJrKItU?usp=sharing

When I import the HippoRAG in a python file and try to run, I get an error in hipporag.py in the function named_entity_recognition "Query NER exception" looks like the client should be self.client

This is the original code

    def named_entity_recognition(self, text: str):
        query_ner_prompts = ChatPromptTemplate.from_messages([SystemMessage("You're a very effective entity extraction system."),
                                                              HumanMessage(query_prompt_one_shot_input),
                                                              AIMessage(query_prompt_one_shot_output),
                                                              HumanMessage(query_prompt_template.format(text))])
        query_ner_messages = query_ner_prompts.format_prompt()
        json_mode = False
        if isinstance(client, ChatOpenAI):  # JSON mode
            chat_completion = client.invoke(query_ner_messages.to_messages(), temperature=0, max_tokens=300, stop=['\n\n'], response_format={"type": "json_object"})
            response_content = chat_completion.content
            total_tokens = chat_completion.response_metadata['token_usage']['total_tokens']
            json_mode = True
        elif isinstance(client, ChatOllama):
            response_content = client.invoke(query_ner_messages.to_messages())
        else:  # no JSON mode
            chat_completion = client.invoke(query_ner_messages.to_messages(), temperature=0, max_tokens=300, stop=['\n\n'])
            response_content = chat_completion.content
            response_content = extract_json_dict(response_content)
            total_tokens = chat_completion.response_metadata['token_usage']['total_tokens']

        if not json_mode:
            try:
                assert 'named_entities' in response_content
                response_content = str(response_content)
            except Exception as e:
                print('Query NER exception', e)
                response_content = {'named_entities': []}

        return response_content, total_tokens

This is the updated code that works for me

    def named_entity_recognition(self, text: str):
        query_ner_prompts = ChatPromptTemplate.from_messages([SystemMessage("You're a very effective entity extraction system."),
                                                              HumanMessage(query_prompt_one_shot_input),
                                                              AIMessage(query_prompt_one_shot_output),
                                                              HumanMessage(query_prompt_template.format(text))])
        query_ner_messages = query_ner_prompts.format_prompt()
        json_mode = False
        if isinstance(self.client, ChatOpenAI):  # JSON mode
            chat_completion = self.client.invoke(query_ner_messages.to_messages(), temperature=0, max_tokens=300, stop=['\n\n'], response_format={"type": "json_object"})
            response_content = chat_completion.content
            total_tokens = chat_completion.response_metadata['token_usage']['total_tokens']
            json_mode = True
        elif isinstance(self.client, ChatOllama):
            response_content = self.client.invoke(query_ner_messages.to_messages())
        else:  # no JSON mode
            chat_completion = self.client.invoke(query_ner_messages.to_messages(), temperature=0, max_tokens=300, stop=['\n\n'])
            response_content = chat_completion.content
            response_content = extract_json_dict(response_content)
            total_tokens = chat_completion.response_metadata['token_usage']['total_tokens']

        if not json_mode:
            try:
                assert 'named_entities' in response_content
                response_content = str(response_content)
            except Exception as e:
                print('Query NER exception', e)
                response_content = {'named_entities': []}

        return response_content, total_tokens
yhshu commented 5 months ago

Thank you for your response! I've submitted a PR.

sravanjosh07 commented 5 months ago

@bonadio I have encountered similar error when I try to use ollama, and mistral to run the script. have you tried it out?

Passage NER exception expected string or bytes-like object

bonadio commented 5 months ago
self.client

HI @sravanjosh07, I did not use ollama. You can add traceback to the exception and print the stack trace. Then you can find the exactly problem.