Where are the Planning (e.g., KL-divergence) and Retrieval-reason optimization implemented?

RManLuo / reasoning-on-graphs

Official Implementation of ICLR 2024 paper: "Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning"

https://arxiv.org/abs/2310.01061

MIT License

318 stars 38 forks source link

Where are the Planning (e.g., KL-divergence) and Retrieval-reason optimization implemented? #10

Closed yflyzhang closed 7 months ago

yflyzhang commented 7 months ago

Thanks for sharing this work!

The paper mentions using KL-divergence as a loss function for planning optimization. However, I couldn't locate the code that implements this KL-divergence loss, along with the loss of retrival-reason.

Could you or someone else please point me to the relevant files or provide more information on where these components are implemented?

RManLuo commented 7 months ago

Thanks for your interest in our work. In our work, we do not explicitly optimize the KL divergence as shown in Eq 2. The final objective in Eq 2. can be derived into two instruction-tunning tasks as shown in Eq 7, 8, 10. Therefore, it can be easily optimized with the existing LLM training objectives (predict the next tokens) to improve training speed. The detailed derivation can be found at section A.1.

yflyzhang commented 7 months ago

Thank you very much for the clarification and additional details regarding the implementation. This clarifies my initial confusion about the implementation of the optimization.

Once again, thanks for your work!