output feature has many 0.

Tulip4attoo commented 3 years ago

Hi, when I trying to generate the "web" output, the output is a series with some first values and all others are 0.

I don't know why this happens? I used tf 1.13 to run it.

For example:

array([[-0.07411094],
       [ 0.50166506],
       [ 0.30637154],
       [ 0.36993235],
       [ 0.08496358],
       [-0.11077818],
       [-0.81356037],
       [-0.16060808],
       [ 0.13039538],
       [ 0.13650048],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],
       [ 0.        ],

fjxmlzn commented 3 years ago

Would you mind providing more details about: which training code/generation code you ran, and what are the parameters you used?

Thanks!

Tulip4attoo commented 3 years ago

Thank you. I tried with different settings, here is 1 of it. I try to use only 10000 data points in "web" dataset instead of 50000 data points for lower training time (currently I am training full 50000 data points with 400 epochs).

epoch = 400
batch_size = 20
vis_freq = 200
vis_num_sample = 5
d_rounds = 3
g_rounds = 1
d_gp_coe = 10.0
attr_d_gp_coe = 10.0
g_attr_d_coe = 1.0
extra_checkpoint_freq = 50
num_packing = 1
g_lr = 0.0001
d_lr = 0.0001

Here is my code. I trained it using jupyterlab on an Azure compute. Training and gen data: https://colab.research.google.com/drive/15uDCfcBY7s-MxMTgJPwD8kc725Zzcs9S?usp=sharing Load model and gen data: https://colab.research.google.com/drive/1CQQVoexeyJJhXp0vku9R6T78FvZoxPVg?usp=sharing

Thank you.

fjxmlzn commented 3 years ago

Thanks for the information!

Firstly, about why the features have many zeros in the end. It's probably because the samples have a shorter length than 550, and the rest will be padded with zeros. You can see the length of each generated samples from data_gen_flag field in the generated data.

Secondly, about why there are lengths shorter than 550. We let DoppelGANger learn the correct length for each sample instead of pre-specifying it, so that it can deal with more general cases (although DG can also support specifying the lengths with minor code changes, discussed in Appendix B). From our experience, to ensure good fidelity (including the lengths), you need to make sure that the total number of training iterations to be large enough. So if you decrease the size of the training set by 5 times, you need to increase the number of epochs by about 5 times as well (i.e. it's hard to reduce the training time if you want good fidelity). We indeed have tested how the fidelity changes by varying the numbers of training samples (with approximately the same total training iterations) in Figure 11 of our paper. In general, you get good fidelity by training on more samples (as expected). So I would recommend using the default parameters in the code.

If you really want to train on 10000 samples instead, the settings we used in Figure 11 were just keeping all other parameters the same as the defaults in code except the number of epochs. I see that you tune other parameters as well, which could give better results (but we haven't tried before). Empirically we see that the default parameters are good enough across different datasets and settings.

Hope this helps, and let me know if you have further questions : )

Tulip4attoo commented 3 years ago

Thank you for your response.

1, Yes, I should check the features again. 2, I will train with more epochs. My deadline is quite tight but I should definitely try more epochs.

About the parameters, at first, I use your parameters but the result is quite poor. I try to change them following a blog post, but the result is not improved at all. I will use default parameters in my future training.

I will try your suggestion and update the result here.

fjxmlzn commented 3 years ago

Thank you! Yeah I think training for more iterations should give much better results : )

Tulip4attoo commented 3 years ago

Could you provide a pre-train model for 1 of your dataset (web or fcc_mba is better)? I just want to check my generation part. Thank you.

fjxmlzn commented 3 years ago

Sure! Here is the pretrained model for web (sample_len=10): https://drive.google.com/drive/folders/1nZly-2G9h212bwzrSDcIeMqfqO9x1miv?usp=sharing

Tulip4attoo commented 3 years ago

I think that the problem is in the Generation part because:

at first, I load your pretrain and found that the problem stays the same. But I found that the length of the output is equal to sample_len (10 non-zero values at first and all others is 0, I checked it by numpy).
I try to generate output after building GAN (after gan.build() without training). But even without the training, it can produce those output (like: 10 non-zero values at first and all others is 0).
if I change sample_len value, the number of non-zero values is equal to sample_len (I tried with 50, 55, or 550. In 550 case, output filled with all non-zero values).

So I am not sure about the training part, but definitely the generation part has some issues. I am figuring it out. Do you have any ideas?

fjxmlzn commented 3 years ago

I see. This is weird. I have tried using this checkpoint to generate samples, and everything looks good. The example script I used for checking is at https://gist.github.com/fjxmlzn/fc61538ae69bf3633334a00401d5b3a6 (put it under DoppelGANger/example_generating_data/ and change mid_checkpoint_dir to the path of the checkpoint I uploaded to google drive).

Tulip4attoo commented 3 years ago

I found that the problem is that I added this line to the doppelganger.py.

self.sess.run(tf.global_variables_initializer())

I added it to sample_from() and train() function. I added it when I was trying to fix some bugs when training.

I am able to generate normal output (from your pretrain) now.

Very thank you for your support.

Tulip4attoo commented 3 years ago

A small note: I cloned your repo 19 days ago, and I just clone your repo today (and it generated normal output).

Why didi I add 2 these lines? Because I want to separate session to view/debug somethings. I often use these lines:

run_config = tf.ConfigProto()
run_config.gpu_options.allow_growth = True
sess = tf.Session(config=run_config)
sess.run(tf.global_variables_initializer())

Since 2019, I switched to use Keras to write TF code, so it's quite hard to debug this. Thank you again for your support.

fjxmlzn commented 3 years ago

Glad to hear that you find out the problem! Thanks!

fjxmlzn / DoppelGANger

output feature has many 0. #7