facebookresearch / meru

Code for the paper "Hyperbolic Image-Text Representations", Desai et al, ICML 2023
Other
137 stars 13 forks source link

Models for different embedding dimensions #3

Open ethanlshen opened 1 year ago

ethanlshen commented 1 year ago

Hi, Wondering if you could upload trained models for different embedding sizes? Thanks

kdexd commented 1 year ago

Hi Ethan 👋

Unfortunately, we do not have these model weights anymore, I apologize for the inconvenience! However, training models with different embedding sizes (as mentioned in the paper) is much faster and computationally efficient than training models from scratch.

Quoting Section 4.4:

We initialize the encoders from ViT-L/16 models to reduce compute requirements, keep them frozen, and re-initialize projection layers and learnable scalars. We train for 30K iterations ...

We have released ViT-L/16 weights (links in README) that remain unchanged in the models you want. Reproducing these experiments will require you to re-initialize the image/text projection layers (two 1024 x 512 weight matrices for ViT-L/16) and four learnable scalars — softmax temperature, curvature, and alpha scaling after projection layers.

Based on my observation, these models train very quickly — you will get reasonable performance well before 30K iterations since the trainable parameters are low. Moreover, you can afford a large batch size with fewer GPUs due to reduced model size.

Let me know if you have further questions!

kdexd commented 1 year ago

(two 1024 x 512 weight matrices for ViT-L/16)

Correction: these will be 1024 x W matrices, where W is your desired output embedding dimension!

ez2rok commented 1 year ago

I am experimenting with models of different dimensions and used the following code. I expect you to have already downloaded one of the pretrained models, and have a path to a train_config file (e.g. train_config = 'configs/train_meru_vit_s.py')

First, let's load the model

import torch 
from meru.config import LazyConfig, LazyFactory
from meru.utils.checkpointing import CheckpointManager

# get device
device = (
        torch.cuda.current_device()
        if torch.cuda.is_available()
        else torch.device("cpu")
    )

# Create the model using training config and load pre-trained weights.
_C_TRAIN = LazyConfig.load(train_config)
model = LazyFactory.build_model(_C_TRAIN, device).eval()
CheckpointManager(model=model).load(checkpoint_path)

Now freeze all layers except for those in learnable_params.

learnable_params = ['logit_scale', 'curv', 'visual_alpha', 'textual_alpha', 'visual_proj.weight', 'textual_proj.weight']
for name, p in model.named_parameters():
    if name not in learnable_params:
        p.requires_grad = False

After this, start your training! Hope this helps!