google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.25k stars 147 forks source link

PlaiGemma finetuned model #113

Open TheMattBin opened 3 months ago

TheMattBin commented 3 months ago

Thanks for the great work! I've checked out some finetuned model released on HF, like docVQA, just wanna know if you have any plans to provide finetune example for different downstream tasks?

Laz4rz commented 3 months ago

You can check Skalski's work with PaliGemma here: https://github.com/roboflow/notebooks/blob/main/notebooks/how-to-finetune-paligemma-on-detection-dataset.ipynb

Also google published Jax tutorial for paligemma finetuned captioning: https://ai.google.dev/gemma/docs/paligemma/fine-tuning-paligemma

I played some with the captioning tutorial and would happily help if I can