Closed dnnspark closed 7 months ago
there are 3 different image encoder models and target_encoder doesn't use masks.
Yup, but in the implementation of predictor, the mask
seems to be passed : https://github.com/facebookresearch/jepa/blob/main/src/models/predictor.py#L232
Is this implementation different from what's described in the paper?
Hi @dnnspark, this is just for backwards compatibility with an old implementation. No bug here. If you want to submit a PR removing the mask argument from the block feel free!
In this implementation of attention, the
mask
input is not used: https://github.com/facebookresearch/jepa/blob/main/src/models/utils/modules.py#L61Can this lead to incorrect training in V-JEPA?