I want to create a RLHF backend/frontend for labelling<=>training<=>correcting error loop.

HumanSignal / label-studio-ml-backend

Configs and boilerplates for Label Studio's Machine Learning backend

Apache License 2.0

473 stars 219 forks source link

I want to create a RLHF backend/frontend for labelling<=>training<=>correcting error loop. #233

Open hemangjoshi37a opened 1 year ago

hemangjoshi37a commented 1 year ago

If anyone has any lead on this please let me know. also anyone want to collaborate on this direction please let me know.

makseq commented 1 year ago

Have you checked active learning?

https://docs.heartex.com/guide/active_learning.html

https://www.youtube.com/watch?v=8EO4vOw1MZc

hemangjoshi37a commented 1 year ago

@makseq While active learning is good but RLHF is quite different than that becuase it implements Reignforcement Learning for optimization of the model. All in all if you know what is RLHF it is quite different than active learning.

makseq commented 1 year ago

Yes, I know, but I expect to see your workflow in LS to achieve it. Seems you need Accept/Reject actions for your annotations? or ranking?

hemangjoshi37a commented 1 year ago

Yes the RLHF can be done in multiple ways. You can have yes no type or ranking type.

hemangjoshi37a commented 1 year ago

Basically what I propose is the have a generalized RLHF model that goes at the output side of any model and instead of having supervised training we can have unsupervised training that can be supervised by the reinforcement model.

makseq commented 1 year ago

Maybe this repo will be helpful for you: https://github.com/heartexlabs/label-studio-RLHF/

hemangjoshi37a commented 1 year ago

@makseq maybe it is a private repo. giving me 404 error

makseq commented 1 year ago

@hemangjoshi37a Sorry, could you please check this one? https://github.com/heartexlabs/RLHF