facebookresearch / jepa

PyTorch code and models for V-JEPA self-supervised learning from video.
Other
2.63k stars 251 forks source link

Potential bug #32

Closed dnnspark closed 7 months ago

dnnspark commented 7 months ago

In this implementation of attention, the mask input is not used: https://github.com/facebookresearch/jepa/blob/main/src/models/utils/modules.py#L61

Can this lead to incorrect training in V-JEPA?

jez-moxmo commented 7 months ago

there are 3 different image encoder models and target_encoder doesn't use masks. image

dnnspark commented 7 months ago

Yup, but in the implementation of predictor, the mask seems to be passed : https://github.com/facebookresearch/jepa/blob/main/src/models/predictor.py#L232

Is this implementation different from what's described in the paper?

MidoAssran commented 7 months ago

Hi @dnnspark, this is just for backwards compatibility with an old implementation. No bug here. If you want to submit a PR removing the mask argument from the block feel free!