VaianiLorenzo / ViPER

This repository contains the code to setup the experiments for the MuSe 2022 reaction sub-challenge.
6 stars 2 forks source link

Training ViPER against CMU-MOSEI #1

Open pretbc opened 7 months ago

pretbc commented 7 months ago

Hello,

Im trying to reproduce You model against CMU-MOSEI dataset

Ive some question regarding training:

  1. is this correct that scheduler.step() is implemented in training loop and not after each epoch ?
  2. did You have any issues with logits during training. The logits output of my model did not have a gradient/grad_fn set to TRUE, so I have had to add
        model.train()
        torch.set_grad_enabled(True)

    for the same perceiver model init()

    1. do You have any historical loss/pearsons plots for comperision purpose

Can You explain MuSe dataset emotion labeling - did they provide each emotion in range 0.0-0.1, 01.-0.2, ...


MorenoLaQuatra commented 7 months ago

Hi @pretbc,

Thank you for using ViPER! I try to answer point-by-point:

  1. The scheduler is correctly placed inside the training loop since we used a step-based LR update scheduler.
  2. We did not notice any issue during training. The require_grad should be enabled by default in training mode for trainable params.
  3. I don't think we have any loss curve, but you may refer to @VaianiLorenzo for this matter.

Let me know if something remains unclear.

pretbc commented 7 months ago
  1. I was just wondering about scheduler because isn't it easer to openly set scheduler step every 10 epoch ?

  2. Hmm strange. I did similar task for video and FAUs and I provide embeddings to the model under input (token size 768 + 32 ) where my input tensor shape [batch_size, 32, 800],

snippet call for perceiver

class PerceiverModelDescriptor:
    """Set configuration for PerceiverModel"""

    def __init__(self, token_size: int = None, num_labels: int = None) -> None:
        """Init vars"""
        self._token_size = token_size
        self._num_labels = num_labels
        if token_size is not None and num_labels is not None:
            self._model = self._init_model()
        else:
            self._model = None

    def _init_model(self) -> PerceiverModel:
        """Set PerceiverModel params"""
        config = PerceiverConfig(d_model=self._token_size, num_labels=self._num_labels)
        decoder = PerceiverClassificationDecoder(
            config,
            num_channels=config.d_latents,
            trainable_position_encoding_kwargs=dict(num_channels=config.d_latents, index_dims=1),
            use_query_residual=True,
        )
        return PerceiverModel(config, decoder=decoder)

    @property
    def model(self) -> PerceiverModel:
        """Return model"""
        return self._model

maybe You set somewhere else gradients before fit model ? Because I do not assume hugging face model has issue.

I observe strange behavior in perceiverModel forward()


        sequence_output = encoder_outputs[0].  <--- tensor with no gradient

        logits = None
        if self.decoder:
            if subsampled_output_points is not None:
                output_modality_sizes = {
                    "audio": subsampled_output_points["audio"].shape[0],
                    "image": subsampled_output_points["image"].shape[0],
                    "label": 1,
                }
            else:
                output_modality_sizes = modality_sizes
            decoder_query = self.decoder.decoder_query(
                inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points
            )
            decoder_outputs = self.decoder(
                decoder_query,
                z=sequence_output,
                query_mask=extended_attention_mask,
                output_attentions=output_attentions,
            )
            logits = decoder_outputs.logits    <--- logits tensor return without gradient/grad_fn
MorenoLaQuatra commented 7 months ago
  1. Sorry, I don't get the question about the scheduler. The scheduler is based on steps and not on epochs, so the .step() is called inside the training loop at each batch.
  2. From what I remember we did not need to explicitly set the require_grad. Maybe it is something that changed in newer releases of the library? I got your point, but I think that if you manually set the gradient computation for the trainable parameters you may solve the issue, am I right?
pretbc commented 7 months ago

if you manually set the gradient computation for the trainable parameters you may solve the issue, am I right?

Thank You

pretbc commented 6 months ago

Hello,

I hope You are well in new Year @MorenoLaQuatra and @VaianiLorenzo

I will not create new topic since this is still question regarding some implementation of Your perceiver into CMU-MOSEI data set.

Im wondering now how did You use text modality

in code

        clip_model, preprocess = clip.load("ViT-B/32")
        clip_model.cuda()

        checkpoint = torch.load(args.clip_checkpoint_path)
        clip_model.load_state_dict(checkpoint["model_state_dict"])
        clip_model.eval()

does it mean that You perform some fine-tuning on clip model ( using template prompts ) ?


2nd question - I want to use Your already trained model from Hugging-face but weights are not enabled anymore - could You re-uploads weights ?