google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Apache License 2.0
2.25k stars 147 forks source link

Adds CapPa model (https://arxiv.org/abs/2306.07915). #82

Closed andsteing closed 9 months ago

andsteing commented 10 months ago

The code was tested on GCP following the instructions in the main README with the following command

gcloud compute tpus tpu-vm ssh $NAME --zone=$ZONE --worker=all --command "
cd big_vision
TFDS_DATA_DIR=gs://$GS_BUCKET_NAME/datasets \
bash big_vision/run_tpu.sh big_vision.trainers.proj.cappa.generative \
--config big_vision/configs/proj/cappa/pretrain.py:batch_size=1024 \
--workdir gs://$GS_BUCKET_NAME/big_vision/workdir/`date '+%m-%d_%H%M'`
"
andsteing commented 9 months ago

Confirmed that I will follow up with an update of the global README. Will mention this next time in the PR description.

andsteing commented 9 months ago

Updating README in #83

sayakpaul commented 4 months ago

Will there be a checkpoint available? 😢

lucasb-eyer commented 4 months ago

We've looked into this quite a bit, but unfortunately the answer is no.

However, I think PaliGemma should inherit most of Cap(Pa)'s advantages (although we didn't test SugarCrepe - I guess we should)