Closed jacobyhsi closed 2 months ago
Following up on my previous comment, I am unable to reproduce TabDDPM's results either.
Hi, thanks for your question!
To perform DCR experiments, you have to resplit the dataset into a training/holdout (testing) set such that they have the same size, then train the generative models based on the training set. The imbalance between the sizes of the training/holdout set will also change the optimal DCR score. Please retry using the new splits. For example, if the ratio between training/testing set is $a/b$, the optimal DCR score tends to be $a/(a+b)$. Therefore, if your training/testing split is 90:10, the optimal DCR score should be exactly 0.90.
We apologize for not clarifying it, and we will fix it later.
Hi, thanks for your question!
To perform DCR experiments, you have to resplit the dataset into a training/holdout (testing) set such that they have the same size, then train the generative models based on the training set. The imbalance between the sizes of the training/holdout set will also change the optimal DCR score. Please retry using the new splits.
We apologize for not clarifying it, and we will fix it later.
It was stated in page 26, last line, but not in this repository.
@jacobyokehongsi, Hi, have your issue been solved? Don't hesitate to ask additional questions if there is any.
Hi @hengruizhang98 ,
Yes!!! Thank you so much for your prompt replies and help! It is greatly appreciated!!!
Hi @hengruizhang98 ,
Hope all is well! I tried reproducing your DCR score results on the default dataset per your paper. .
The following commands were run:
python main.py --dataname default --method vae --mode train --gpu 0 python main.py --dataname default --method tabsyn --mode train --gpu 0 python main.py --dataname default --method tabsyn --mode sample --gpu 0 python eval/eval_dcr.py --dataname default --model tabsyn --path synthetic/default/tabsyn.csv
However, I am unable to reproduce your results:
Would you please elaborate on this?
Thank you!