Closed BaeHann closed 1 year ago
I am not sure If I can clear up all your questions, as this was a winter project for a class in university and I specialize mostly in image models and a bit of NLP now haha. If i remember correctly the authors wanted to allow the generator to be trained with more information than simply the binary decision of the discriminator saying "this data is real" or "this data is fake" so they use an auxiliary embedding network, that serves to compute "embeddings" => a long vector representation of the original data. To adress the second point of your question, the code calculates four losses and optimizes each of them separately, to train the generator, embedding network, discriminator and recovery network jointly. These are all calculated on their own and optimized with distinct optimizers initialised with their own respective learning rates. I would have to read the paper again for more info, but I do agree that the approach in this paper, the original tensorflow implementation and our pytorch implementation are all pretty convoluted 😅
The loss calculations can be found here https://github.com/benearnthof/TimeGAN/blob/0c8ab7133eb41369ce2b2815e07915fd8651e27f/modules_and_training.py#L216 they all boil down to Binary cross entropy and Mean squared error loss, because we assume that the latent representations we obtain from the embedding & recovery auxiliary networks are informative enough to make these losses sufficiently smooth for training. It should be noted that during our experiments the training was not stable so I'd recommend WGAN to be honest!
This, of course, also depends on the data you want to model. Hope I could help!
Thank you very much for your detailed reply. I think I should clarify what I am confused about.
Yes upon further inspection I forgot to remember the supervisor network as another auxiliary step that helps train both the embedding network & generator. I'd have to step through the code again to see where exactly it is invoked but we start using the supervisor when we initialize the embedding network.
The data is modeled with either RNNs, GRUs, or LSTMs, depending on what you specify int he TimeGAN module. You specify the rnn_type of the TimeGAN according to either of these three cases: https://github.com/benearnthof/TimeGAN/blob/0c8ab7133eb41369ce2b2815e07915fd8651e27f/modules_and_training.py#L29 The default is set to use GRUs.
Thank you very much!! I have the last question about Time-GAN. Please forgive my tediousness. Could you tell me about how the RNN, GRU or LSTM capture the stepwise conditional distributions in the data? Thanks a lot!
I think the architecture thats easiest to understand are RNNs, the other two are conceptually similar but have different implementations. The basic Idea is that you use the parameters in the RNN to calculate a projection of the inputs at time state 1 and then use the output you get from this as the input for the next step. This is the stepwise distribution, conditioned on the timestep before if you unroll the entire process. Check out the cheatsheet from stanford here:
https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
That should help clarify the correspondence between using outputs from one timestep as the input to the next timestep and modeling the stepwise conditional distribution over time.
Thank you very much! (^▽^)
Hi, Mr benearnthof: Really sorry to inflict myself on you again! I am confronted by two new questions.
Thank you a lot and wish you every success in the future!