Problems with downstream tasks

Hello, I am very inspired by your work. Since this model is a self-supervised learning model, it is extremely relevant to my work on human activity recognition. I modified this model to make it suitable for human activity recognition data set, but when I used a small amount of labeled data for training on downstream tasks, I found that the effect was very poor. Can you give me some inspiration to solve this problem? At the same time, I am wondering whether the encoder can really perform gradient backpropagation in this architecture. Since the reconstruction process is at the feature level, it feels like the decoder is being trained all the time, but what is the relationship between the encoder and the decoder?

gaasher / I-JEPA

Problems with downstream tasks #10