Question about multi-hop question:

kkk-an commented 6 months ago

Thank you for your work, and I have one question about multi-hop question or multi-entity question:

For example, the case in Table 17: Question: Who influenced Arthur Miller that was influenced by Lucian? Reasoning Path: Arthur Miller → influence.influence_node.influenced_by → William Shakespeare → influence.influence_node.influenced_by → Lucian. (Path 1, Score: 0.75) What do you think that your Entity Prune can directly get "William Shakespeare" in the first step? (If actually there exists many other entities, how do you ensure that you could choose the correct answer or linked entity??)

Same questions might be found in your code prompt: The movie featured Miley Cyrus and was produced by Tobin Armbrust?

I have noticed that your method has never included existing paths in pruning. How can you ensure that the movie directed by Tobin Armbrust is correctly selected if your start entity is Miley Cyrus?

Actually, there is no interaction on the paths here if you question including multiple entities.

GasolSun36 commented 6 months ago

(1) The reason why entity prune is handed over to chatgpt is that it is expected that it will have enough intrinsic knowledge to correctly analyze which entity is related to the current relations and question. (2) We create few-shot based on ground-truth answer. (3) Our current search and pruning steps do not include historical information because we have tested before that adding historical information has very little impact on the final result (probably because the prompt will become very long and have to be truncated), and focus on local information. It is enough for ToG to answer the question (because the reasoning step has complete paths). Of course, we are also exploring using 16K turbo to verify the effect after introducing historical information.

kkk-an commented 6 months ago

Thank you for your quick reply, and I have understood your mechanism of entity prune.

However, your starting point is to solve the hallucinations in LLMs, by incorporating Knowledge Graph into the reasoning procedure. But as you written: "The reason why entity prune is handed over to chatgpt is that it is expected that it will have enough intrinsic knowledge to correctly analyze which entity is related to the current relations and question.", it is the intrinsic knowledge in LLMS works in choosing the right entity, I think these two are conflicting as you claim your method mitigate the hallucination. Thus, your think-on-graph is somewhat inappropriate, you list the possible candidates and just choose the most possible one.

Further, I am confused if your query has more than one entity, your search paths start from each entity, however, as each path do not interact with each other, how to find the right entity in each relation. For example: Who influenced Arthur Miller that was influenced by Lucian? I believe that Arthur Miller must be influenced by many persons, as you choose the entity of relation of influenced_by, how to make sure it was influenced by Lucian? as the search space is huge. If it is directly chosen by LLMs' intrinsic knowledge, why do not directly answer the question?

GasolSun36 commented 6 months ago

First of all, solving the hallucination is only one contribution. There are other contributions such as the last paragraph of the introduction of the paper: Deep reasoning, Responsible reasoning, Flexibility and efficiency. Secondly, we also discussed the hallucination issue of ToG in Appendix B.2 of the paper, and admitted that the hallucination still exists, but we simply use the correctness or wrongness of the answers to judge whether the hallucination has been solved. Third, we do not choose the most possible one. We maintain a TopN reasoning path. This reasoning paths are for LLM to use as a reference when generating answers. It is not the same as ordinary beam search (see Appendix B.1 in the paper for details). Take this question as an example, Who influenced Arthur Miller that was influenced by Lucian? We first select the top three most likely related entities based on the influenced_by relationship (prune through LLM), and then based on these three entities to iterate the search. However, since the starting point is to start from Arthur Miller and Lucian at the same time, and there is a high probability that entities will be retrieved based on the influenced_by relation, then there will be overlap in the reasoning stage.

kkk-an commented 6 months ago

Alright, you work inspires me a lot, thank you for your timely reply again.

IDEA-FinAI / ToG

Question about multi-hop question: #8