Closed anderdnavarro closed 12 months ago
Hi @anderdnavarro Could you show me how you set up the parameters? Did you only specify the interger column? Your problem is indeed abnormal. How many epochs did you train? You can try to train one epoch to see because one epoch is enough to let CTABGAN to generate data.
Best,
Zilong
Hi @zhao-zilong,
Thank you very much for your quick response!
The parameters are:
synthesizer = CTABGAN(raw_csv_path = csv,
test_ratio = 0.3,
categorical_columns = [],
log_columns = [],
mixed_columns = {},
general_columns= [],
integer_columns = ['value'],
problem_type= {None: None},
epochs = epochs,
batch_size = batch_size)
I read in #3 something related to the mix type, so I ran a couple of tests with mixed_columns = {'value':[0]}
, just to check what happens, although it's not my case. With this configuration, I don't get any result, but now the problem is related to #7. I'm running an attempt with more epochs and if not I'll change the code to don't check the quality of the simulation.
Let me know if you need more information.
Thanks! Ander
Hi @anderdnavarro
I don't think you have the mixed type column for your data. Just to make sure it's not the problem of your data, can you just use first 100 rows to train your model and tell me the result? You don't need to train too many epochs, one is enough to test that. I don't think the bug is epoch-related.
Best,
Zilong
I did several tests (I repeat some of them to see if I obtain always the same result):
When it fails is always the same original error: ValueError: Cannot convert non-finite values (NA or inf) to integer
I can share with you the training file if you want.
Thanks!! Ander
@anderdnavarro
yeah, please. My email is imzhaozilong@gmail.com This is really strange.
Zilong
Hi @anderdnavarro This is indeed an interesting bug, but unfortunately, I can only reproduce it 1 within 20 try. I don't know why and it is difficult to reproduce in my side. But acutally, I can give you a nasty solution for this. You just move out the 'value' from the setting
integer_columns = ['value'],
You can just let it generate float data. And then you do the transfer explicitly
syn['value'].astype(int)
With that, I never encountered the problem again. I originally wants to use this method to debug, but after setting like that, I never met this bug again......... Have a try and looking forward to your feedback.
Zilong
Hi @zhao-zilong,
I continue with the issue, the result after the syn['value'].astype(int)
transformation with different epochs and bach_sizes is:
value
""
""
""
""
""
And I get the same ValueError: Cannot convert non-finite values (NA or inf) to integer
error after syn['value'].astype(int)
, as expected.
I double checked that my conda environment has the same version of the packages, and it does:
name: ctabgan
dependencies:
- python=3.7
- pip=20.2.4
- pandas=1.2.4
- scipy=1.4.1
- biopython=1.78
- jupyter=1.0.0
- pip:
- numpy==1.21.0
- scikit-learn==0.24.1
- torch==1.9.1
- dython==0.6.4.post1
- tqdm==4.65.0
- pyfaidx==0.7.2.1
- click==8.1.2
If you are using other python version or see something weird I think I can create a docker image with this requirements, just to check that the problem is not because of that.
BTW, I have all the models saved (.pkl
), will they be useful for you?
Thanks! Ander
Hi @anderdnavarro my environment:
pyhthon3.10.12
numpy 1.25.2
pandas 2.1.0
torch 2.0.1
dython 0.5.1
scipy 1.11.2
I just tested in a random computer, not the original one that I publish this code, but above environment let me generate data without problem
Zilong
Hi @anderdnavarro
Some updates here. I took some times to investigate this problem. It is interesting that when this bug happens, actually all the generator paramters becomes "nan". In the end, I locate the bug. It is from the calculation of gradient penalty. In this line: https://github.com/Team-TUD/CTAB-GAN-Plus/blob/6d72fda3a9f382339e55cb4b35befced4c1f3508/model/synthesizer/ctabgan_synthesizer.py#L323 In some situations, it will become NaN. So then the problem becomes "the gradient penalty of wasserstein GAN becomes NaN". I searched online and see this: https://github.com/Team-TUD/CTAB-GAN-Plus/blob/6d72fda3a9f382339e55cb4b35befced4c1f3508/model/synthesizer/ctabgan_synthesizer.py#L323 Seems you can add number to the calculated gradient to solve that. But to be honest, I didn't find the exact solution for this problem. So I will leave it like that.
Best,
Zilong
Hi @zhao-zilong,
I updated my environment but I still have the same issue, although it's true that I can train with a slightly higher number of epochs until it appears.
I followed both suggestions they made for "Gradient of gradient explodes(nan) when training WGAN-GP on Mnist #2534":
gradient_penalty = torch.mean((1. - torch.sqrt(1e-16+torch.sum(gradients.view(gradients.size(0), -1)**2, dim=1)))**2) * lambda_
gradients = gradients + 1e-16
But it didn't solve the problem. I don't really know what is happening, because I trained the model with the whole database (10 variables, including this) and it worked.
Now that you discovered that this is due to an error in the calculation of the gradient penalty, I think I will continue with your previous model "CTAB-GAN" just for this, as this step is not implemented.
Thank you very much for your help!! Ander
Hi,
I'm trying to train your model with a dataset that only contains one integer variable (1 col x 22928 rows), but after the training (the model is saved) I obtain the following error:
I could confirm that the error occurs because
sample = self.synthesizer.sample(n)
returns NA values in thectabgan.py
file, but I don't know what is happening or how to fix it. I could fence the problem to one specific line of thectabgan_synthesizer.py
script:fake = self.generator(noisez)
.My data doesn't have NA values and follows this density distribution:
I have the same problem training the model with a dataset with two variables (one integer and other categorical).
Thank you very much! Ander