[DOC] ClassifierDrift - Githubissues

cmougan commented 1 year ago

In which data is the classifier drift trained? The documentation does not state it very clear.

Classifier-based drift detector. The classifier is trained on a fraction of the combined reference and test data and drift is detected on the remaining data. To use all the data to detect drift, a stratified cross-validation scheme can be chosen.

arnaudvl commented 1 year ago

The ClassifierDrift detector is trained on a portion of the combined reference set x_ref and test set x_test. If the train_size argument is a float between 0 and 1, then a random sample of size int(train_size * (len(x_ref) + len(x_test))) from the combined data [x_ref, x_test] is used for training. The held out fraction 1 - train_size is then used for testing for drift. If we instead specify n_folds as an int we apply cross-validation to ensure we leverage all the data for both training and out-of-sample testing. The n_folds argument has priority over train_size. This is clarified in the docs under the detector's usage section: https://docs.seldon.io/projects/alibi-detect/en/stable/cd/methods/classifierdrift.html#Usage

cmougan commented 1 year ago

Thanks for the clarification and link!

I was thinking that perhaps we can improve the documentation either by extending or adding a link. What do you think?

SeldonIO / alibi-detect

[DOC] ClassifierDrift #692