GFNOrg / torchgfn

GFlowNet library
https://torchgfn.readthedocs.io/en/latest/
Other
209 stars 26 forks source link

Guidelines on finetuning LLMs as policy models #170

Open Saltychtao opened 6 months ago

Saltychtao commented 6 months ago

Hello,

Thank you for your effort on releasing such great implementation of GFN! I am working on the using GFN to finetune an LLM to be a policy model (which I believe would be a popular use case) and would like to ask for some suggestions.

The main problem in this scenario is to efficiently sample from language models in a parallel way, which need to store the key-value cache of Transformer to avoid re-computation. Do you have any suggestions on how to implement the State class so that we can store the partial token sequence and key-value cache at the same time?

saleml commented 6 months ago

Hello,

Thanks for raising the issue. This is an important question we're trying to address these days: how to allow for more flexible state spaces, including graphs for instance.

As of now, states need to be represented as tensors, so the natural way would be to consider long tensors that contain all information you need to transition from a state to another. In this case, maybe you can use some dimensions of the state to store the key-value cache, and some dimensions to store the decoded token indices