Team-TUD / CTAB-GAN-Plus

Official GitHub for CTAB-GAN+
70 stars 10 forks source link

QUESTION FOR REGRESSION #15

Closed chaotianshinaida closed 9 months ago

chaotianshinaida commented 11 months ago

I want to know can CTAB-GAN+ generate data . Question1 The first column of my data is determined by the last thirteen columns.My data are all continuous data.When I input it into GTAN-GAN+ according to the king file, the program reports an error: ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

Question2:Where in the code can you see that CTAB-GAN+ can solve not only classification problems but also regression problems? I hope to get your reply, it's really important to me

zhao-zilong commented 11 months ago

Hi @chaotianshinaida
I'm not sure if that is still the case. I think if you put your target column in the last column, then it should work. We take the last colume by default for the classifier/regressor.

And for where we defined the regressor, actually we still call it "classifier" in the paper, check this line of code: https://github.com/Team-TUD/CTAB-GAN-Plus/blob/ab07d3f1a3bd41ccdf72f9fef73ce3c704a9368d/model/synthesizer/ctabgan_synthesizer.py#L30 there is where when we have regression problem, we modified the last layer of the "classifier" to make it output only a number. We don't apply softmax in the logits anymore.

Would you have a try and tell me if that works for you?

Best,

Zilong

chaotianshinaida commented 11 months ago

my target column in the last column now but it still report error like this. how to deal with this problem?

chaotianshinaida commented 11 months ago

QQ截图20231210110358

chaotianshinaida commented 11 months ago

categorical_columns = [], log_columns = [], mixed_columns= {}, non_categorical_columns= [] general_columns= ['A','B','C','D','E'] integer_columns = ['R','F'] problem_type= {"Regression": "N"] YOU SEE,This is the part of the input data that I modified

chaotianshinaida commented 11 months ago

This is an example in github, it doesn't work either

chaotianshinaida commented 11 months ago

![Uploading QQ截图20231210112659.png…]()

zhao-zilong commented 11 months ago

Hi @chaotianshinaida

I have push a fix for this. https://github.com/Team-TUD/CTAB-GAN-Plus/commit/f7451c77a5c72fe59c99d816847ced28df2037a8 you can just modify one line to make it work.

chaotianshinaida commented 11 months ago

Thank you very much for your answer, now the code works

chaotianshinaida commented 11 months ago

Hello author, I am very happy that I used CTAB-GAN+ to generate a set of data. But there may still be some unresolved problems in the code that evaluates the data later.If you are free, I hope you can help me take a look pingguchucuo

chaotianshinaida commented 11 months ago

Traceback (most recent call last): File "E:\keyan\生成对抗神经网络\CTAB_GANP\CTAB-GAN-Plus-main\suanfaregression.py", line 32, in result_mat = get_utility_metrics(real_path,fake_paths,"MinMax",model_dict, test_ratio = 0.20) File "E:\keyan\生成对抗神经网络\CTAB_GANP\CTAB-GAN-Plus-main\model\eval\evaluation.py", line 57, in get_utility_metrics X_train_real, X_test_real, y_train_real, y_test_real = model_selection.train_test_split(data_real_X ,data_real_y, test_size=test_ratio, stratify=data_real_y,random_state=42) File "D:\biancheng\Anaconda\anaconda\envs\ctabganplus\lib\site-packages\sklearn\utils_param_validation.py", line 214, in wrapper return func(*args, **kwargs) File "D:\biancheng\Anaconda\anaconda\envs\ctabganplus\lib\site-packages\sklearn\model_selection_split.py", line 2670, in train_test_split train, test = next(cv.split(X=arrays[0], y=stratify)) File "D:\biancheng\Anaconda\anaconda\envs\ctabganplus\lib\site-packages\sklearn\model_selection_split.py", line 1746, in split for train, test in self._iter_indices(X, y, groups): File "D:\biancheng\Anaconda\anaconda\envs\ctabganplus\lib\site-packages\sklearn\model_selection_split.py", line 2147, in _iter_indices raise ValueError( ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

chaotianshinaida commented 11 months ago

I want to export the training set and test set. Which part of the code is this? And can your code modify the division ratio of the training set and the test set?

zhao-zilong commented 11 months ago

Hi @chaotianshinaida

Sorry it takes some times to get back to you, I was quite busy last few days. Please check the new updates, it should fix all your bugs. It was a mistake, I didn't update the evaluation code for this repo for the regression dataset evaluation. Thanks for pointing out that.

Best,

Zilong

chaotianshinaida commented 10 months ago

Thank you I will try again