Open pretbc opened 7 months ago
Hi @pretbc,
Thank you for using ViPER! I try to answer point-by-point:
Let me know if something remains unclear.
I was just wondering about scheduler because isn't it easer to openly set scheduler step every 10 epoch ?
Hmm strange. I did similar task for video and FAUs and I provide embeddings to the model under input (token size 768 + 32 ) where my input tensor shape [batch_size, 32, 800],
snippet call for perceiver
class PerceiverModelDescriptor:
"""Set configuration for PerceiverModel"""
def __init__(self, token_size: int = None, num_labels: int = None) -> None:
"""Init vars"""
self._token_size = token_size
self._num_labels = num_labels
if token_size is not None and num_labels is not None:
self._model = self._init_model()
else:
self._model = None
def _init_model(self) -> PerceiverModel:
"""Set PerceiverModel params"""
config = PerceiverConfig(d_model=self._token_size, num_labels=self._num_labels)
decoder = PerceiverClassificationDecoder(
config,
num_channels=config.d_latents,
trainable_position_encoding_kwargs=dict(num_channels=config.d_latents, index_dims=1),
use_query_residual=True,
)
return PerceiverModel(config, decoder=decoder)
@property
def model(self) -> PerceiverModel:
"""Return model"""
return self._model
maybe You set somewhere else gradients before fit model ? Because I do not assume hugging face model has issue.
I observe strange behavior in perceiverModel forward()
sequence_output = encoder_outputs[0]. <--- tensor with no gradient
logits = None
if self.decoder:
if subsampled_output_points is not None:
output_modality_sizes = {
"audio": subsampled_output_points["audio"].shape[0],
"image": subsampled_output_points["image"].shape[0],
"label": 1,
}
else:
output_modality_sizes = modality_sizes
decoder_query = self.decoder.decoder_query(
inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points
)
decoder_outputs = self.decoder(
decoder_query,
z=sequence_output,
query_mask=extended_attention_mask,
output_attentions=output_attentions,
)
logits = decoder_outputs.logits <--- logits tensor return without gradient/grad_fn
.step()
is called inside the training loop at each batch.require_grad
. Maybe it is something that changed in newer releases of the library? I got your point, but I think that if you manually set the gradient computation for the trainable parameters you may solve the issue, am I right?if you manually set the gradient computation for the trainable parameters you may solve the issue, am I right?
Thank You
Hello,
I hope You are well in new Year @MorenoLaQuatra and @VaianiLorenzo
I will not create new topic since this is still question regarding some implementation of Your perceiver into CMU-MOSEI data set.
Im wondering now how did You use text modality
in code
clip_model, preprocess = clip.load("ViT-B/32")
clip_model.cuda()
checkpoint = torch.load(args.clip_checkpoint_path)
clip_model.load_state_dict(checkpoint["model_state_dict"])
clip_model.eval()
does it mean that You perform some fine-tuning on clip model ( using template prompts ) ?
2nd question - I want to use Your already trained model from Hugging-face but weights are not enabled anymore - could You re-uploads weights ?
Hello,
Im trying to reproduce You model against CMU-MOSEI dataset
Ive some question regarding training:
scheduler.step()
is implemented in training loop and not after each epoch ?for the same perceiver model init()
Can You explain MuSe dataset emotion labeling - did they provide each emotion in range
0.0-0.1, 01.-0.2, ...