Closed SvenGroen closed 1 year ago
Hi Sven,
Very good question, we forgot to mention that. To include columns in "non_categorical_columns", you actually need to add the column also in "categorical_columns". I know it sounds weird, we should change that later. "non_categorical_columns" means that the column is categorical but it can be very high dimensional, so we deal it as continuous. For columns in "non_categorical_columns", we first encode the columns to numerical number, and then treat it as continuous column (using variational gaussian mixture). If you also add the column in "general_transform", it will first encode the column in numerical number and then treat it by "general_transform" instead of default continuous column encoding.
Hope you can understand better now.
Best,
Zilong
Hi Zilong,
Yes, the name is indeed a bit misleading. But your explanation makes totally sense. Thanks for the clarification!
Best, Sven
Hey,
I wanted to ask what kind of columns should be included into the "non_categorical_columns" list, as I could not find an explanation. From looking at the code, I would guess that "non_categorical_columns" are "categorical columns, that are already numeric (e.g. Label encoded)".
Can you confirm that I understood this correct? If not, can you clarify the purpose.
Cheers, Sven