Team-TUD / CTAB-GAN

Official git for "CTAB-GAN: Effective Table Data Synthesizing"
Apache License 2.0
76 stars 19 forks source link

Convergence on Adult data #20

Closed margauxto closed 3 weeks ago

margauxto commented 11 months ago

I have an issue of convergence of CTABGAN trained on the Adult dataset. I have kept the parameters by default and runned the training over 150 epochs.

Despite the fake data seeming to be similar to the real data in terms of statistical distributions, I am quite surprised by the behavior of the losses during the training. Have you managed to show that CTABGAN converges properly (in terms of losses) when training on the Adult dataset ?

You can find below the plot of my losses (in orange the loss of the generator and in blue the loss of the discriminator + in absciss the number of epochs). You can observe that the loss of the discriminator oscillates around the value of 0 (which is quite normal), but that the loss of the generator increases and then decreases which shows instability in training. Of note I have also a problem of convergence of CTABGAN on custom datasets.

training_adult
zhao-zilong commented 11 months ago

Hi @margauxto Actually, there are several losses combined within generator and discriminator losses, you can separately plot them and then it's easier to see the trends.

margauxto commented 11 months ago

Hi @zhao-zilong,

As suggested, I have plotted the losses separately to better see their trends (see the plot below) during the training of CTABGAN on the Adult dataset with default parameters.

On the right side of the plot the y axis correspond to the values of the total generator (orange curve) and information loss (red curve) and on the left side to the values of theother losses. The x axis correspond to the number of epochs (total of 300).

training_adult

Despite that the total loss of the generator seems to converge, I am quite surprised of the oscillations. Have you also observed this trend?

Moreover, I have noticed that CTABGAN seems to not deal with continuous variables: they are truncated to integers. Is this right or am I missing something?

zhao-zilong commented 11 months ago

Hi @margauxto

Yeah, actually from our test, we also see the strong oscillations for the losses even though we can see the convergence trend. One thing that I think it can improve ctabgan and ctabgan+ stability but we didn't do is that we can give each loss term a parameter. Currently they are just simply sum up.

For continuous variables, no, it shouldn't be truncated to integers only if you specify that as in our demo code:

integer_columns = ['age', 'fnlwgt','capital-gain', 'capital-loss','hours-per-week'],