Training ViPER against CMU-MOSEI

pretbc commented 7 months ago

Hello,

Im trying to reproduce You model against CMU-MOSEI dataset

Ive some question regarding training:

is this correct that scheduler.step() is implemented in training loop and not after each epoch ?
did You have any issues with logits during training. The logits output of my model did not have a gradient/grad_fn set to TRUE, so I have had to add
```
    model.train()
    torch.set_grad_enabled(True)
```
for the same perceiver model init()
1. do You have any historical loss/pearsons plots for comperision purpose

Can You explain MuSe dataset emotion labeling - did they provide each emotion in range 0.0-0.1, 01.-0.2, ...

MorenoLaQuatra commented 7 months ago

Hi @pretbc,

Thank you for using ViPER! I try to answer point-by-point:

The scheduler is correctly placed inside the training loop since we used a step-based LR update scheduler.
We did not notice any issue during training. The require_grad should be enabled by default in training mode for trainable params.
I don't think we have any loss curve, but you may refer to @VaianiLorenzo for this matter.

Let me know if something remains unclear.

pretbc commented 7 months ago

I was just wondering about scheduler because isn't it easer to openly set scheduler step every 10 epoch ?
Hmm strange. I did similar task for video and FAUs and I provide embeddings to the model under input (token size 768 + 32 ) where my input tensor shape [batch_size, 32, 800],

snippet call for perceiver

class PerceiverModelDescriptor:
    """Set configuration for PerceiverModel"""

    def __init__(self, token_size: int = None, num_labels: int = None) -> None:
        """Init vars"""
        self._token_size = token_size
        self._num_labels = num_labels
        if token_size is not None and num_labels is not None:
            self._model = self._init_model()
        else:
            self._model = None

    def _init_model(self) -> PerceiverModel:
        """Set PerceiverModel params"""
        config = PerceiverConfig(d_model=self._token_size, num_labels=self._num_labels)
        decoder = PerceiverClassificationDecoder(
            config,
            num_channels=config.d_latents,
            trainable_position_encoding_kwargs=dict(num_channels=config.d_latents, index_dims=1),
            use_query_residual=True,
        )
        return PerceiverModel(config, decoder=decoder)

    @property
    def model(self) -> PerceiverModel:
        """Return model"""
        return self._model

maybe You set somewhere else gradients before fit model ? Because I do not assume hugging face model has issue.

I observe strange behavior in perceiverModel forward()


        sequence_output = encoder_outputs[0].  <--- tensor with no gradient

        logits = None
        if self.decoder:
            if subsampled_output_points is not None:
                output_modality_sizes = {
                    "audio": subsampled_output_points["audio"].shape[0],
                    "image": subsampled_output_points["image"].shape[0],
                    "label": 1,
                }
            else:
                output_modality_sizes = modality_sizes
            decoder_query = self.decoder.decoder_query(
                inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points
            )
            decoder_outputs = self.decoder(
                decoder_query,
                z=sequence_output,
                query_mask=extended_attention_mask,
                output_attentions=output_attentions,
            )
            logits = decoder_outputs.logits    <--- logits tensor return without gradient/grad_fn

MorenoLaQuatra commented 7 months ago

Sorry, I don't get the question about the scheduler. The scheduler is based on steps and not on epochs, so the .step() is called inside the training loop at each batch.
From what I remember we did not need to explicitly set the require_grad. Maybe it is something that changed in newer releases of the library? I got your point, but I think that if you manually set the gradient computation for the trainable parameters you may solve the issue, am I right?

pretbc commented 7 months ago

if you manually set the gradient computation for the trainable parameters you may solve the issue, am I right?

YES :)

Thank You

pretbc commented 6 months ago

Hello,

I hope You are well in new Year @MorenoLaQuatra and @VaianiLorenzo

I will not create new topic since this is still question regarding some implementation of Your perceiver into CMU-MOSEI data set.

Im wondering now how did You use text modality

in code

        clip_model, preprocess = clip.load("ViT-B/32")
        clip_model.cuda()

        checkpoint = torch.load(args.clip_checkpoint_path)
        clip_model.load_state_dict(checkpoint["model_state_dict"])
        clip_model.eval()

does it mean that You perform some fine-tuning on clip model ( using template prompts ) ?

2nd question - I want to use Your already trained model from Hugging-face but weights are not enabled anymore - could You re-uploads weights ?

VaianiLorenzo / ViPER

Training ViPER against CMU-MOSEI #1