Guidelines on finetuning LLMs as policy models

GFNOrg / torchgfn

GFlowNet library

Other

209 stars 26 forks source link

Hello,

Thank you for your effort on releasing such great implementation of GFN! I am working on the using GFN to finetune an LLM to be a policy model (which I believe would be a popular use case) and would like to ask for some suggestions.

The main problem in this scenario is to efficiently sample from language models in a parallel way, which need to store the key-value cache of Transformer to avoid re-computation. Do you have any suggestions on how to implement the State class so that we can store the partial token sequence and key-value cache at the same time?

GFNOrg / torchgfn

Guidelines on finetuning LLMs as policy models #170