Bug in `train_test_split`

izagorac commented 7 months ago

The function to split the CounterfactualData into a train and test set assumes that the amount of data points per class in the data is equal. This can be seen in the following line of code: n_per_class = round(N / length(classes_))

This line divides the total amount of data points N by the amount of classes. However, the amount of data points are not always equal for each class. Instead, the function should get the amount of data points for each class and split the dataset based on those values.

ceferisbarov commented 7 months ago

I think creating a balanced test split makes sense, but it would be nice to have both options. I created a pull request for this (#403). We can now use keep_class_ratio to enforce the relative class size of the entire dataset onto the test split.

pat-alt commented 6 months ago

closing as #403 has been merged

JuliaTrustworthyAI / CounterfactualExplanations.jl

Bug in `train_test_split` #402