Unable to reproduce behaviour of TimeGAN

DamianUS commented 1 year ago

Hi @birdx0810 !

Thank you for your great work. The code is clean, classy and neat. I come here not for addressing an issue, but more for seeking for some guidance.

We have been working for a long while with both TimeGAN and your code, and have addressed the issues you mentioned in your open issue #1 and made all the necessary changes to make the code behave the same both in TimeGAN and your implementation (the noise generation, and we even reintroduced a "bug" you noticed and cleaned: the sigmoid activation function as the last layer of the recovery network that doesn't make sense to us).

I explain myself a little bit so you can gain some context: we're trying to generate alibaba2018 machine usage data-center trace. We have been able to obtain good results with TimeGAN with a particular hyperparameterisation (3 GRU layers, 10 hidden dimensions, 100 of batch size, 288 sequence length, and the rest of parameters by default). However, with your implementation we cannot get the same behaviour as that of TimeGAN, and the "problem" is particularly focused: the G_loss_V is way higher (1 order of magnitude), and the model seems to be unable to catch the patterns and the inherent variance.

Seen that you have a vast knowledge on this topic, I gently ask you for some intuitions you may have on why pytorch behaves differently than TF1: we're blocked as we don't really know if it's some internals of the loss functions, optimisers (and backpropagation) and/or Pytorch's RNN implementations.

I would be very grateful if you could share some hints (if you may have some) on why this discrepancy in the behaviour of the G_loss_V happens.

birdx0810 commented 1 year ago

Hi,

I'm might not be able to give you an in depth answer regarding the problem you are facing for now as I'm no longer in academia. However I will address somethings that might be helpful to you.

...sigmoid activation function as the last layer of the recovery network...

I believe this will depend on the range of your input. If it is normalized to be within (-1, 1), then sigmoid would definitely not make sense, where tanh will be more realistic. I'm not able to give you an answer regarding normalizing your data from (0, 1) or (-1, 1) will be better although I believe remember some scholars would prefer one over the other.

...G_loss_V is way higher...

I would suggest changing the weight of the other losses in the model. The model might be too focused on the other losses and thus neglecting the Moments loss. This is just an assumption.

...why pytorch behaves differently than TF1...

To be honest with you, I also faced the same problem when I was doing my thesis (which I eventually gave up). If you Google for Discrepancy between PyTorch and Tensorflow, you might come across this discussion and realize that you're not alone.

In this project I tried my best to match most of the configurations (e.g. initialization, PyTorch is not Xavier/Glorot by default). But still underperforms when compared to the original TF1 version. Reasons could range from implementation of RNNs, Optimizer, CuDNN, or even random seed (this could happen as GANs are unstable).

Although you're question might not be answered there (mine wasn't unfortunately), I would argue that Time-Series GAN (at least when this paper was published) is still immature and the training GANs itself is a very unstable process, a random seed might give you very different results. Therefore, adding more mature methods might help you have a more stable process (e.g. changing the optimizer to AdamW).

Feel free to discuss with me via email and all the best in your research.

DamianUS commented 1 year ago

Thank you very much for your reply.

It's a pity you abandoned academia, your skills and expertise are great and I'm sure that they are now in good use.

We'll stay in the trenches, and, if we find something relevant out I'll let you know (even though it would be just out of curiosity).

Again, let me congratulate you for this great work. If you feel the call for research again, we'll be grateful to collaborate with you, hahaha.

Best!

birdx0810 / timegan-pytorch

Unable to reproduce behaviour of TimeGAN #8