About the rich feedback model release

google-research-datasets / richhf-18k

RichHF-18K dataset contains rich human feedback labels we collected for our CVPR'24 paper: https://arxiv.org/pdf/2312.10240, along with the file name of the associated labeled images (no urls or images are included in this dataset).

80 stars 2 forks source link

About the rich feedback model release #3

Open srymaker opened 1 month ago

srymaker commented 1 month ago

Thanks for your great work! Will the rich feedback model be released?I'd love to test and experience the model and apply it to my own tasks！

leebird commented 4 weeks ago

Hello, currently we don't have the plan to release the model.

densechen commented 3 weeks ago

@leebird Looking forward to the rich feedback model...

udrs commented 3 weeks ago

Looking forward

leebird commented 3 weeks ago

Thanks for all the interests in our work! Due to company policies (related to productization etc.) we could not open source the model. We have included details of how to reproduce the results in our paper. If you have further questions please email the corresponding authors, and we'd be happy to help you reproduce the results.

srymaker commented 3 weeks ago

Hello, can you tell me how you trained the reward model, like which layers were frozen and which tuning method was used?

leebird commented 3 weeks ago

Hi @srymaker , we finetuned all the layers in the model, including the ViT component. We tried freezing the ViT component but it didn't work well, especially for the heatmap tasks. Experiment details including hyperparameters and optimizer can be found in Section 9 in the paper.

srymaker commented 2 weeks ago

Thank you for your answer. Do all layers refer to the encoder and decoder in t5?

leebird commented 1 week ago

@srymaker yes, all the layers are from the ViT and T5 encoder/decoder. Note there is a pretraining stage for the ViT and T5 layers on multimodal data as they were originally pretrained on unimodal data only.