Open Sohojoe opened 1 year ago
Hi there,
thank you for the dataset.
I've implemented a CLIP benchmark of the dataset -> CLIP_visual-spatial-reasoning
I found I was able to go from 50% to ~55% true zero shot (i.e. no retraining at all) through prompt engineering. I'm implementing retraining now and will keep updating with the results.
Thanks a lot for this! I was also thinking about CLIP baselines — so happy to see that it’s already being done so nicely :)
Please do keep us posted.
Hi there,
thank you for the dataset.
I've implemented a CLIP benchmark of the dataset -> CLIP_visual-spatial-reasoning
I found I was able to go from 50% to ~55% true zero shot (i.e. no retraining at all) through prompt engineering. I'm implementing retraining now and will keep updating with the results.