Open eddiebergman opened 2 years ago
@eddiebergman I recently started using auto-sklearn and would like to contribute to it.
Could you please assign this issue to me and guide me on it?
Hi @Mathanraj-Sharma,
For sure, happy to have a contribution! Essentially while doing some refactoring I noticed these assert statements and I found them very "test-like", i.e. this should really be a test and not done everytime some calls predict_proba
. Normally assertions are not to costly but seeing as this does two checks over the entire probability array, I imagine it's not a minor
It is good however to have these checks but I can imagine a user getting these messages and then not knowing what to do with them. To be honest, if I got this message, I wouldn't even know what to do with it. Seeing as it's never been raised as an issue, I assume we are safe and this isn't an issue.
So, seeing as this has never been raised and the recovery mechanism isn't clear, I would want to essentially drop these statements if they take a meaningful amount of time. This means we would need to do a quick benchmark of just how long they actually take with various sizes of probabilities to confirm it takes a while, and if so, drop them. Secondly, we also need to make sure we have some test somewhere that captures this assertion. On individual components themselves and as a whole. I did a quick check and we actually don't do any such check on classifier components and we actually have no such check for automl instances either.
Testing the second part is not the most straightforward. The first thing to not is that we have Estimators
which are supposed to be an sklearn compliant wrapper of AutoML
which is doing the grunt work of everything. An estimator is mostly just calling it's underlying automl methods. Hence, we can test the probabilities on an AutoML.
We use cases
which are trained AutoML
instances that are cached and have their trained state copied so we can do different tests without retraining each time. These cases
are passed as a parameter to the tests but extracting information about how these cases
were trained is a bit more difficult. For testing the multi-label part, it would require something like
from autosklearn.automl import AutoMLClassifier
from autosklearn.constants import MULTILABEL_CLASSIFICATION
from pytest_cases import parametrize_with_cases
import test.test_automl.cases as cases
@parametrize_with_cases("automl", cases=cases, has_tag=["classifier", "fitted"]))
def test_predict_proba_consistent_output(automl: AutoMLClassifier) -> None:
"""
Expects
-------
* Whatever we expect of the output
"""
task_type = automl._task
if task_type == MULTILABEL_CLASSIFICATION:
assert ...
assert ...
Recap:
Please let me know if there's anymore info or help you need!
@eddiebergman thanks for assigning it to me and for the explanation, let me go through it and get back to you.
@eddiebergman
By benchmarking, do you want me to compare the run time of the predict_proba
method, with and without the assert statements for a set of datasets? In that case, do you have any particular datasets in mind?
Hi @Mathanraj-Sharma,
Datasets are too much, this could just be with toy data thats of comparable size to perhaps a "small" dataset and a "large" one. The one that would take the longest time is probably valid Multi-Label based probabilities but it would be good to see for the other two, multiclass and binary.
Edit: I guess we do need some real example to have an idea of the actual time required to generate the probabilities from the models so we can compare how much the assertions take, relative to the actual predictions.
The reason we need the benchmark is we have disagreement with respect to how much it might impact an enduser.
Best, Eddie
Hello @eddiebergman,
I was wondering if this request is still open ? or it has been solved already?
Thanks in advance
https://github.com/automl/auto-sklearn/blob/5e21e9cbd405eaef47b5e5d68cf092254ccffb51/autosklearn/estimators.py#L1453-L1465
There's a lot of assertion checking to do here which can really eat into inference time. While the checks are helpful, they seem like they should really be enforced in testing.