Open fuen1590 opened 4 months ago
As long as the encoder has the same shape of input and output (B, T, C), you can use any model architecture you want. Also, you can switch to other encoder architecture using the code here: https://github.com/blacksnail789521/TimeDRL/blob/master/models/_load_encoder.py#L101
Thank you for the author's reply, but I still have some questions. For example, if there is no distinction between Instance Features and Patch Features in the feature maps generated by ResNet, then TimeDRL cannot carry out targeted self supervision tasks for both. In this case, applying TimeDRL to ResNet is treating one feature map as an Instance Feature and the rest as Patch Features?
What do you mean by the instance features and the patch features?
Sorry, instance features means "instance-level embeddings" and patch features means "timestamp-level embeddings" as in your paper.
if there is no distinction between Instance Features and Patch Features in the feature maps generated by ResNet, then TimeDRL cannot carry out targeted self supervision tasks for both.
Despite the encoder architecture, the [CLS] token's corresponding embedding is always the instance-level embedding, while the rest are always the timestamp-level embeddings (or patch-level embeddings, since we are currently using patches). Since the [CLS] token is at the beginning, if we have T_p
patches, considering the [CLS] token, we have 1 + T_p
patches as the input. Consequently, for the output, we also have 1 + T_p
embeddings: the first one is the instance-level embedding, and the rest are the timestamp-level embeddings. As you can see, all these concepts are irrelevant to the encoder's architecture.
Okay. I understand. Thanks very much!
I have no idea how to apply TimeDRL to ResNet structure as you said in your paper, because there is no Patch and CLS token. Thanks!