Team-TUD / CTAB-GAN

Official git for "CTAB-GAN: Effective Table Data Synthesizing"
Apache License 2.0
76 stars 19 forks source link

Getting NAN on the side >=64 #19

Closed aahsan045 closed 1 month ago

aahsan045 commented 1 year ago

Hi, I am getting NANs for generating the data having dimension greater than or equal to 64. The dataset is the real-numbered vectors. Although, I have changed the code lines as mentioned to enlarge the side size, but its not working.

Any help in this regard would be appreciated.

Also, where we can change the learninng_rate or Batch size parameters to see the learning behaviour of CTAB GAN +.

zhao-zilong commented 1 year ago

Hi @aahsan045 , if you get NANs, there is high possibility that your original table contains empty cells. Since ctab-gan will consider this and reproduce empty cells in the synthetic data.

For the learning_rate or batch size, check here: https://github.com/Team-TUD/CTAB-GAN-Plus/blob/6d72fda3a9f382339e55cb4b35befced4c1f3508/model/synthesizer/ctabgan_synthesizer.py#L398

and here: https://github.com/Team-TUD/CTAB-GAN-Plus/blob/6d72fda3a9f382339e55cb4b35befced4c1f3508/model/synthesizer/ctabgan_synthesizer.py#L348C18-L348C28

Zilong

aahsan045 commented 1 year ago

Hi @zhao-zilong , thanks for quick reply. I have cross verified. There are no empty cells in the data. Also, if I run the generator for say 10 rounds and by augmenting the data of previous round, the Nan comes at the last, and whole synthetic table gives NAN for all cells in a row. Even I tried for lower table dimensions but getting the same error.

What else can be checked

zhao-zilong commented 1 year ago

@aahsan045 Is that possible to make a demo in google colab and send me the link to my email? Or maybe to share a link of your data so that I can have a look. With the description, I don't have further thoughts on it.

aahsan045 commented 1 year ago

@zhao-zilong, Thanks, the NAN issue is resolved by changing the batch size, learning rate and by hyperparameter tuning.

Now, I am facing the issue of the synthesizer getting stuck and not able to produce the synthetic data. My original data is (45,16). which have 45 samples with 16 features in it. I have tried to use different combinations of lr, batch size, and params for generator and discriminator, but nothing is working. Can you give me a kick start for diagnosing.

Particularly, syn.generate_samples() is not producing any samples,

zhao-zilong commented 1 year ago

Hi @aahsan045 , could you check this solution first: https://github.com/Team-TUD/CTAB-GAN-Plus/issues/7#issuecomment-1576690333 Tell me if it works or not.