Closed izagorac closed 6 months ago
I think creating a balanced test split makes sense, but it would be nice to have both options. I created a pull request for this (#403). We can now use keep_class_ratio
to enforce the relative class size of the entire dataset onto the test split.
closing as #403 has been merged
The function to split the
CounterfactualData
into a train and test set assumes that the amount of data points per class in the data is equal. This can be seen in the following line of code:n_per_class = round(N / length(classes_))
This line divides the total amount of data points
N
by the amount of classes. However, the amount of data points are not always equal for each class. Instead, the function should get the amount of data points for each class and split the dataset based on those values.