fjxmlzn / DoppelGANger

[IMC 2020 (Best Paper Finalist)] Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions
http://arxiv.org/abs/1909.13403
BSD 3-Clause Clear License
296 stars 75 forks source link

Request for availability of the scripts used to reproduce figures #46

Open rllyryan opened 1 year ago

rllyryan commented 1 year ago

Dear @fjxmlzn and @wangchen615,

I am really interested in your work in synthetic generation of time series data with high-dimensional metadata, it is quite a wonderful paper to read (Easily comprehensible for someone new to the field of GANs). And the methods proposed were sound and reasonable to alleviate some of the challenges to attain the best synthetic data fidelity.

Could I check you you guys over the availability of the scripts used to generate the figures in the paper? I would like to reproduce the results, especially the autocorrelation one. If possible, could you show how to run such functions or scripts?

Namely figures (1), (5), (6), (8), and (9)!

I really appreciate your help in this.

Thank you in advance

fjxmlzn commented 1 year ago

For autocorreltion, please see https://github.com/fjxmlzn/DoppelGANger/issues/20#issuecomment-858234890

Is this the code you used for getting the figure you showed in https://github.com/fjxmlzn/DoppelGANger/issues/22#issuecomment-1665730328?

For other figures, please bear with me some time as I need to dig them out from another computer...

rllyryan commented 1 year ago

For autocorreltion, please see #20 (comment)

Is this the code you used for getting the figure you showed in #22 (comment)?

For other figures, please bear with me some time as I need to dig them out from another computer...

Hi @fjxmlzn thanks for looking into it!

I used the auto-correlation code you provided in the comment.

The code I used to get the figure is the generate.py file that the issue owner provided in #22.

Thank you so much for your help!

Could I request the environment setup that you guys were experimenting on? (i.e., GPU, IDE, Python Ver, Tensorflow Ver, etc.)

Update: I ran the training again without the GPUTaskScheduler again and got the a graph of the same shape (Seems like an offset)

image

Update (2): I think something might have gone wrong with the training, the acf figures from generated data from checkpoint 4 to 399 are totally identical. It is because training somehow did not update the weights of the model?

I am now trying it out with batch size 1000 and lower learning rates of 1e-4 that was utilized by Gretel.ai.

The graphs you see here are all using the default parameters (trained the model twice).

Checkpoint 4 image

Checkpoint 399 image

fjxmlzn commented 1 year ago

The hyperparameters in that code seem to be slightly different from the default ones. This repo contains an example code for data generation. Would you mind running the following code in order without changing any hyper-parameters and see if that gives reasonable results: (1) Training: https://github.com/fjxmlzn/DoppelGANger/blob/master/example_training(without_GPUTaskScheduler)/main.py (2) Generation: https://github.com/fjxmlzn/DoppelGANger/blob/master/example_generating_data(without_GPUTaskScheduler)/main.py

Your python and Tensorflow versions should be fine.

rllyryan commented 1 year ago

The hyperparameters in that code seem to be slightly different from the default ones. This repo contains an example code for data generation. Would you mind running the following code in order without changing any hyper-parameters and see if that gives reasonable results: (1) Training: https://github.com/fjxmlzn/DoppelGANger/blob/master/example_training(without_GPUTaskScheduler)/main.py (2) Generation: https://github.com/fjxmlzn/DoppelGANger/blob/master/example_generating_data(without_GPUTaskScheduler)/main.py

Your python and Tensorflow versions should be fine.

Oh I apologise, I meant that I am currently running another training cycle with those parameters, the graphs you see are all the default parameters used for training. I tried training it twice with the same default parameters.

I will run one last time tonight (with default parameters), and let you know if anything changes.

fjxmlzn commented 1 year ago

I see. Did you make any changes to the code in this repo? The autocorrelation plot, and especially the phenomenon that checkpoints 4 and 399 give the same data, mean that something is wrong.

You can also upload the code and checkpoints to some place (e.g., google drive) so that I can take a look.

rllyryan commented 1 year ago

I see. Did you make any changes to the code in this repo? The autocorrelation plot, and especially the phenomenon that checkpoints 4 and 399 give the same data, mean that something is wrong.

You can also upload the code and checkpoints to some place (e.g., google drive) so that I can take a look.

I actually cloned the repository and ran as it is, but maybe something changed without me knowing. I will reclone the respository ASAP and retrain it overnight today (hopefully it solves it).

I will upload the checkpoints into Google Drive once I run the one with original parameters, should be by tomorrow.

rllyryan commented 1 year ago

@fjxmlzn

Could I trouble you to share the code you utilised to produce for distribution of (max + min)/2 of real against DoppelGANger?

I apologise for any inconvenience caused.

rllyryan commented 1 year ago

@fjxmlzn Sorry for the delay, here is the google drive for the new training run after recloning the repository. It contains the checkpoint for epoch 399, and the code for plotting the ACF.

It seems that it got worse.

image

https://drive.google.com/drive/folders/1lrtq8PvjnFXj0P_18IlEwGAAy_n-1Z9q?usp=sharing