Align QA models with human feedback

deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

https://haystack.deepset.ai

Apache License 2.0

17.86k stars 1.93k forks source link

Align QA models with human feedback #1955

Closed venuraja79 closed 8 months ago

venuraja79 commented 2 years ago

Is your feature request related to a problem? Please describe. Haystack already has a feature to collect human feedback for the answers generated. This feedback can be used to improve the transformer based QA models by fine tuning them further.

Describe the solution you'd like The framework proposed in this paper (https://openai.com/blog/learning-to-summarize-with-human-feedback/) helps to align the model towards the human preferred answers.

Describe alternatives you've considered None

Additional context Hope this will be a great addition to haystack. If the team feels the same way, willing to spend more time to drill-through the solution further to fit this for haystack QA. Currently, this paper focuses on summarization.

MichelBartels commented 2 years ago

Hi @venuraja79. What you are describing sounds really interesting! Although we suppose that it's less useful for extractive QA, there could be interesting use cases for generative QA and summarization.

Sadly, it isn't currently on our roadmap and at this point we would like to focus on other things. However, if you would like to take a closer look at this, we would be happy to discuss your ideas.

julian-risch commented 2 years ago

Hi @venuraja79 I agree with @MichelBartels that it's an exciting topic! Could you share some more details already how you think this could be applied to (generative?) QA? I understand that a key component of the approach is to train and use a reward model. Could you please describe in more detail how would you imagine that for QA? I think I have a rough idea about that but I haven't heard anything about Proximal Policy Optimization(PPO) so that would be new to me. Overall it seems like a complex topic but after your having seen your various great contributions to haystack I am confident you would be up to it. 👍

venuraja79 commented 2 years ago

Thank you @MichelBartels and @julian-risch for your views. I agree that it's a challenging framework to box into a solution because that involves multiple steps. For now, I also feel that it's applicable for generative QA because of its subjective nature. PPO is new to me too :) especially applying RL in NLP domain. I'll try to share a detailed approach for QA soon.