Some questions about settings of experiments

RManLuo / reasoning-on-graphs

Official Implementation of ICLR 2024 paper: "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning"

https://arxiv.org/abs/2310.01061

MIT License

293 stars 32 forks source link

Some questions about settings of experiments #6

Closed sitaocheng closed 8 months ago

sitaocheng commented 9 months ago

Hi there, I am trying to reproduce your results. Here are some questions i am curious about:

In Table 2 , in CWQ datatset, RoG achieves 62.6 on Hit@1 and 56.2 on F1, which is great. But in Table 4, the 'LLaMA2-Chat-7B + RoG Planning ' gets 56.41 on Hit@1 in CWQ (even better than chatGPT?), did you finetune this model on reasoning setting ? if so , what's the difference between this setting and the original RoG , and the results are different (62.6 and 56.41 on Hit@1)?

Thank you for your precious time!

RManLuo commented 9 months ago

Hi @StauskasCST ， Thanks for your interest in our work. In Table 4, we conduct the plug-and-play experiments where we integrate the planning module of RoG with different LLMs. Specifically, we use RoG to generate the relation paths and feed the retrieved reasoning paths as context into different LLMs for reasoning. These reasoning LLMs are not fine-tuned.

sitaocheng commented 9 months ago

Thank you so much, and sorry for the late reply.

As you mentioned, based on Table 4, somethings can be infered:

the LLaMA2-Chat-7B (fixed) plays better than ChatGPT based on your predict relation paths.
The relation path generated on Table 4 is based on planning by LLaMA2-Chat-7B (fine-tuned by both Planning and Retrieval-reasoning aka equation(7) )
Your fine-tuned version (use the same model to firstly planning ,(then retrieve), and reasoning with the same model) achieved SOTA result.

Are they correct?

RManLuo commented 9 months ago

Thank you so much, and sorry for the late reply.

As you mentioned, based on Table 4, somethings can be infered:

the LLaMA2-Chat-7B (fixed) plays better than ChatGPT based on your predict relation paths.

The relation path generated on Table 4 is based on planning by LLaMA2-Chat-7B (fine-tuned by both Planning and Retrieval-reasoning aka equation(7) )

Your fine-tuned version (use the same model to firstly planning ,(then retrieve), and reasoning with the same model) achieved SOTA result.

Are they correct?

I think your understanding is correct!