code-312 / rescue-chicago

Repository for work related to a interactive data dashboard that can be used to analyze how different dog characteristics may correlate with average length of stay in a shelter prior to adoption.
https://code312-rescue-trends-2659be78e6b4.herokuapp.com/
1 stars 0 forks source link

Preprocess Features for a Model #1

Closed kaylarobinson077 closed 1 year ago

kaylarobinson077 commented 1 year ago

Most machine learning models expect exclusively numeric input features. Some (most?) of our features are categories (puppy, young, adult... or breed names for example).

Let's use pandas.DataFrame as the data structure in preparing our dataset for modeling. Scikit-learn, the most commonly used ML package, supports this datatype for running models.

Preprocessing ideas:

I think it would make the most sense to organize this as a new step that runs on the output from data_cleaner. We could call it data_preprocessor?

JJD129 commented 1 year ago

Under the preprocessing branch, there's a data_preprocessing.py file in the petfinder folder