blacksnail789521 / TimeDRL

Official repository for TimeDRL: Disentangled Representation Learning for Multivariate Time-Series, accepted at ICDE 2024.
MIT License
16 stars 3 forks source link

Could the authors tell me how to apply the TimeDRL method to ResNet structure? #4

Open fuen1590 opened 4 months ago

fuen1590 commented 4 months ago

I have no idea how to apply TimeDRL to ResNet structure as you said in your paper, because there is no Patch and CLS token. Thanks!

blacksnail789521 commented 4 months ago

As long as the encoder has the same shape of input and output (B, T, C), you can use any model architecture you want. Also, you can switch to other encoder architecture using the code here: https://github.com/blacksnail789521/TimeDRL/blob/master/models/_load_encoder.py#L101

fuen1590 commented 4 months ago

Thank you for the author's reply, but I still have some questions. For example, if there is no distinction between Instance Features and Patch Features in the feature maps generated by ResNet, then TimeDRL cannot carry out targeted self supervision tasks for both. In this case, applying TimeDRL to ResNet is treating one feature map as an Instance Feature and the rest as Patch Features?

blacksnail789521 commented 4 months ago

What do you mean by the instance features and the patch features?

fuen1590 commented 4 months ago

Sorry, instance features means "instance-level embeddings" and patch features means "timestamp-level embeddings" as in your paper.

blacksnail789521 commented 4 months ago

if there is no distinction between Instance Features and Patch Features in the feature maps generated by ResNet, then TimeDRL cannot carry out targeted self supervision tasks for both.

Despite the encoder architecture, the [CLS] token's corresponding embedding is always the instance-level embedding, while the rest are always the timestamp-level embeddings (or patch-level embeddings, since we are currently using patches). Since the [CLS] token is at the beginning, if we have T_p patches, considering the [CLS] token, we have 1 + T_p patches as the input. Consequently, for the output, we also have 1 + T_p embeddings: the first one is the instance-level embedding, and the rest are the timestamp-level embeddings. As you can see, all these concepts are irrelevant to the encoder's architecture.

fuen1590 commented 4 months ago

Okay. I understand. Thanks very much!