Closed sitaocheng closed 8 months ago
Hi @StauskasCST , Thanks for your interest in our work. In Table 4, we conduct the plug-and-play experiments where we integrate the planning module of RoG with different LLMs. Specifically, we use RoG to generate the relation paths and feed the retrieved reasoning paths as context into different LLMs for reasoning. These reasoning LLMs are not fine-tuned.
Thank you so much, and sorry for the late reply.
As you mentioned, based on Table 4, somethings can be infered:
Are they correct?
Thank you so much, and sorry for the late reply.
As you mentioned, based on Table 4, somethings can be infered:
- the LLaMA2-Chat-7B (fixed) plays better than ChatGPT based on your predict relation paths.
- The relation path generated on Table 4 is based on planning by LLaMA2-Chat-7B (fine-tuned by both Planning and Retrieval-reasoning aka equation(7) )
- Your fine-tuned version (use the same model to firstly planning ,(then retrieve), and reasoning with the same model) achieved SOTA result.
Are they correct?
I think your understanding is correct!
Hi there, I am trying to reproduce your results. Here are some questions i am curious about:
In Table 2 , in CWQ datatset, RoG achieves 62.6 on Hit@1 and 56.2 on F1, which is great. But in Table 4, the 'LLaMA2-Chat-7B + RoG Planning ' gets 56.41 on Hit@1 in CWQ (even better than chatGPT?), did you finetune this model on reasoning setting ? if so , what's the difference between this setting and the original RoG , and the results are different (62.6 and 56.41 on Hit@1)?
Thank you for your precious time!