Categorical and continuous variables preprocessing

AlexiaJM commented 6 years ago

With the UCI data, how did you preprocess the categorical and continuous variables?

Did you enforce a min/max or did you just standardize the continuous variables? And for the categorical, did you use one-hot/dummy coding or standardized them?

Edit: Also, what batch size did you use? Did it depend on the sample size?

Thanks!

untom commented 6 years ago

Hi Alexia, thanks for your interest in our paper. I think @gklambauer knows the answer to your questions, but he's on holidays until next week, so he'll be able to privde you with some answers then :)

gklambauer commented 6 years ago

Hello Alexia,

Great that you made the Cat-GANs work with SELUs! We really appreciate your insights!

We used the preprocessed data sets as provided by the authors of "Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?": http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/ http://persoal.citius.usc.es/manuel.fernandez.delgado/papers/jmlr/data.tar.gz Yes, they standardized continuous variables. We used the batch-size as the minimum of 128 and a fifth of the training data size (such that we have at least 5 minibatches for one epoch). So, for large data sets the batch-size did not depend on the data set size and for the small data sets, the batch-size did depend on the data set size. The authors of "Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?" provide multiple train-test splits, of which we used only the first.

Regards, Günter

AlexiaJM commented 6 years ago

Thanks for the info! Btw, yes SELU works great in GANs and I'm sure they do in many undiscovered applications! Great work on the paper.

bioinf-jku / SNNs

Categorical and continuous variables preprocessing #4