ds5110 / faces

3 stars 0 forks source link

Repetitive code #17

Closed sophiacofone closed 1 year ago

sophiacofone commented 1 year ago

How much refactoring do we want to do when it comes to repetitive code?

For example, we are all doing some form of getting the data frame, and extracting certain columns. Do we want to have just one file/script for that?

I am also noticing that many of us are plotting confusion matrix, doing train test split, classification_report, etc. Currently I have separate helper functions for all of that in sc_helpers.py. My suggestion would be to build off that file (or possibly add to Jesse's model.py if we think that would be a good option) so we have one "source of truth" for those basic functions that we use over and over again. I am also happy to integrate my helpers into another script if thats easier. What do you guys think?

jhautala commented 1 year ago

I think that would be cool to have a consistent way to evaluate models. Any GridSearchCV stuff is probably too model-specific to easily generalize, but we could make a new util (e.g. util/evaluate_models.py) that declares a function that takes a classifier, applies a consistent train/test split, and outputs classification report, confusion matrix and a scatter plot.

For the latter step, we could probably reuse the existing scatter function in util/plot.py:

from util.plot import scatter

It's got a pretty gnarly function signature (lots of optional params), but I'll try and add some documentation...

jhautala commented 1 year ago

I want to focus on adding make targets for all my output first, but I'll see if I can get around to adding that evaluate function. In the mean time, if anyone wants to take initiative and add it, that's cool too.

sophiacofone commented 1 year ago

I think that sounds like a good idea