Reproducing DCR Score - Githubissues

amazon-science / tabsyn

Official Implementations of "Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space""

Apache License 2.0

76 stars 27 forks source link

Reproducing DCR Score #18

Closed jacobyhsi closed 2 months ago

jacobyhsi commented 4 months ago

Hi @hengruizhang98 ,

Hope all is well! I tried reproducing your DCR score results on the default dataset per your paper. .

The following commands were run:

python main.py --dataname default --method vae --mode train --gpu 0 python main.py --dataname default --method tabsyn --mode train --gpu 0 python main.py --dataname default --method tabsyn --mode sample --gpu 0 python eval/eval_dcr.py --dataname default --model tabsyn --path synthetic/default/tabsyn.csv

However, I am unable to reproduce your results:

Would you please elaborate on this?

Thank you!

jacobyhsi commented 4 months ago

Following up on my previous comment, I am unable to reproduce TabDDPM's results either.

hengruizhang98 commented 3 months ago

Hi, thanks for your question!

To perform DCR experiments, you have to resplit the dataset into a training/holdout (testing) set such that they have the same size, then train the generative models based on the training set. The imbalance between the sizes of the training/holdout set will also change the optimal DCR score. Please retry using the new splits. For example, if the ratio between training/testing set is $a/b$, the optimal DCR score tends to be $a/(a+b)$. Therefore, if your training/testing split is 90:10, the optimal DCR score should be exactly 0.90.

We apologize for not clarifying it, and we will fix it later.

hengruizhang98 commented 3 months ago

Hi, thanks for your question!

To perform DCR experiments, you have to resplit the dataset into a training/holdout (testing) set such that they have the same size, then train the generative models based on the training set. The imbalance between the sizes of the training/holdout set will also change the optimal DCR score. Please retry using the new splits.

We apologize for not clarifying it, and we will fix it later.

It was stated in page 26, last line, but not in this repository.

hengruizhang98 commented 3 months ago

@jacobyokehongsi, Hi, have your issue been solved? Don't hesitate to ask additional questions if there is any.

jacobyhsi commented 3 months ago

Hi @hengruizhang98 ,

Yes!!! Thank you so much for your prompt replies and help! It is greatly appreciated!!!