Closed venuraja79 closed 8 months ago
Hi @venuraja79. What you are describing sounds really interesting! Although we suppose that it's less useful for extractive QA, there could be interesting use cases for generative QA and summarization.
Sadly, it isn't currently on our roadmap and at this point we would like to focus on other things. However, if you would like to take a closer look at this, we would be happy to discuss your ideas.
Hi @venuraja79 I agree with @MichelBartels that it's an exciting topic! Could you share some more details already how you think this could be applied to (generative?) QA? I understand that a key component of the approach is to train and use a reward model. Could you please describe in more detail how would you imagine that for QA? I think I have a rough idea about that but I haven't heard anything about Proximal Policy Optimization(PPO) so that would be new to me. Overall it seems like a complex topic but after your having seen your various great contributions to haystack I am confident you would be up to it. 👍
Thank you @MichelBartels and @julian-risch for your views. I agree that it's a challenging framework to box into a solution because that involves multiple steps. For now, I also feel that it's applicable for generative QA because of its subjective nature. PPO is new to me too :) especially applying RL in NLP domain. I'll try to share a detailed approach for QA soon.
Is your feature request related to a problem? Please describe. Haystack already has a feature to collect human feedback for the answers generated. This feedback can be used to improve the transformer based QA models by fine tuning them further.
Describe the solution you'd like The framework proposed in this paper (https://openai.com/blog/learning-to-summarize-with-human-feedback/) helps to align the model towards the human preferred answers.
Describe alternatives you've considered None
Additional context Hope this will be a great addition to haystack. If the team feels the same way, willing to spend more time to drill-through the solution further to fit this for haystack QA. Currently, this paper focuses on summarization.