emadeldeen24 / AdaTime

[TKDD 2023] AdaTime: A Benchmarking Suite for Domain Adaptation on Time Series Data
MIT License
180 stars 21 forks source link

Loss of CDAN #7

Closed RomainMsrd closed 1 year ago

RomainMsrd commented 1 year ago

Hi,

First, thank you for this huge piece of work, it's a very useful one. I have a little question about CDAN Loss. I saw that you added a conditional entropy loss computed on the target features only, this doesn't seemsto be implemented (or maybe not this way) in the original CDAN code. What is the use of this loss and where does it come from ?

Best regards

emadeldeen24 commented 1 year ago

Kindly check the "Entropy Conditing" section in the paper.

RomainMsrd commented 1 year ago

Sorry to bother you again. I have in fact already checked this paper, which lead me to your implementation. The fact is the conditional entropy is computed on both source and target, while in your code I only see it applied to target. Do we make a specific assumption on the source data that the paper doesn't mention and would allow us to only apply and minimize it on target data ? Maybe a more simple application of the entropy minimization principle instead of using the entropy conditioning as a reweighting strategy. I'm probably missing a very simple trick, sorry again.

mohamedr002 commented 1 year ago

@RomainMsrd Thanks for bringing this to our attention. You're absolutely right. There are two distinct versions of the CDAN paper:

Basic CDAN CDAN-E. In this implementation, we've focused on the basic CDAN version. This means we should indeed omit the conditional entropy from the target and rely on the multilinear combinations between features and predictions.

However, if one were to incorporate CDAN+E, it would necessitate its application in both the source and target domains, as you've correctly pointed out.

We appreciate your keen observation and feedback. We'll work on refining the implementation to ensure its correctness.

This response confirms the issue, aligns with the user's observation, and indicates a commitment to improving the implementation.

RomainMsrd commented 1 year ago

Thank you very much for your answer ! I was worried that I didn't understood a potential mathematical trick you used. Glad I could help you improve this great piece of work !

I'd also like to take this opportunity to point out another potential error. I haven't looked into it as carefully as I did with CDAN, but it seems that your implementation of DIRT is more in line with VADA. VADA and DIRT are presented in the same paper, DIRT is initialized with VADA but then we are supposed to have several pseudo-labeling steps to improve the model margin on the target set and I don't see this iterative pseudolabeling in your implementation.

mohamedr002 commented 1 year ago

@RomainMsrd

Thank you for highlighting this. To provide some clarity regarding DIRT: there are actually three distinct versions. Initially, there was VADA, which exclusively used virtual adversarial training. Following that, DIRT was introduced, incorporating conditional entropy losses and exponential moving average updates. Finally, DIRT-T was developed, adding the component of pseudo labeling. For the scope of our current implementation, we've focused solely on DIRT. I hope this clarifies your issues and I appreciate your insights

RomainMsrd commented 1 year ago

@mohamedr002

You are right, I did confused DIRT and DIRT-T, thank your for this explanation !

mohamedr002 commented 1 year ago

@RomainMsrd You are welcome; happy that you find our framework useful to you. Please, feel free to reach out to us if you have any issues.