ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

Another Visualization idea #234

Open calz1 opened 7 years ago

calz1 commented 7 years ago

We had discussed scikit-plot in #227. I stumbled on this which can do some neat visualizations both for feature selection and the models. Parts look easy to integrate because you just need to pass training and testing data. Any opposition if I work on a PR referencing it?

ClimbsRocks commented 7 years ago

Looks great; can't wait!

I just finished the code change I'd mentioned earlier, so I don't anticipate any changes to overlapping parts of the code base for a bit.

Great find! Might just inspire me to do some more complicated feature selection.

On Wed, Jun 7, 2017 at 3:50 AM calz1 notifications@github.com wrote:

We had discussed scikit-plot in #227 https://github.com/ClimbsRocks/auto_ml/pull/227. I stumbled on this http://www.scikit-yb.org/en/latest/ which can do some neat visualizations both for feature selection and the models. Parts look easy to integrate because you just need to pass training and testing data. Any opposition if I work on a PR referencing it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ClimbsRocks/auto_ml/issues/234, or mute the thread https://github.com/notifications/unsubscribe-auth/AGsSVcBDBQcBHFCpP38n_UPL1dkS6TUWks5sBoBlgaJpZM4Nyh7U .

calz1 commented 7 years ago

@ClimbsRocks

OK, I read their documentation a little too quick. I thought things like their confusion matrix visualizer only required the actuals and predictions (which is how I did the text-based confusion matrix), but you also need to feed it the model.

I see in auto_ml/utils_model_training.py you have all sorts of references to the model but none in utils_scoring where I was looking to put the new code. I know you wrote you were very focused on keeping memory usage reasonable so I don't want to start making global copies of the model. What would be your preference as to making the model available during scoring?

ClimbsRocks commented 7 years ago

Great question! From a memory perspective, i'm much more worried about stuff like copying the whole dataset- the model itself is pretty light.

i think the best approach to take is to modify .score in predictor.py, to pass in the model itself right next to where we pass in advanced_scoring=True. Then, we can modify the code in utils_scoring.py to make use of the model. I would like to make having the model optional, so we can nest these particular bits of scoring behind something like if model is not None:, and make model=None our default param.

can't wait to see how this goes!