SAFESPACE22 / ParkingSWE

0 stars 0 forks source link

Backend Machine Learning #22

Closed melissa-ng closed 2 months ago

melissa-ng commented 3 months ago

Description:

The goal of this feature branch is to develop a machine learning model for predicting the number of vacant parking spots in a parking garage based on various features. The model will be trained on a dataset of parking garage floor data and evaluated using standard machine learning techniques. Additionally, functions will be created to preprocess the dataset, train and fine-tune the models, predict using the trained model, save the model to disk, and visualize model performance.

Requirements:

  1. Access to the parking garage dataset CSV file.
  2. Python environment with required libraries such as pandas, scikit-learn, matplotlib, and pickle.

Success Criteria:

  1. Successfully load the dataset into memory.
  2. Implement functions to shuffle and normalize the dataset.
  3. Split the dataset into training and testing sets.
  4. Train machine learning models using the training dataset.
  5. Fine-tune the models using hyperparameter tuning techniques.
  6. Successfully predict vacant parking spots using the trained model.
  7. Save the trained model to a pickle file.
  8. Implement functions to visualize model performance metrics such as accuracy, precision, recall, and F1-score.

Considerations:

  1. Ensure data integrity and handle missing or inconsistent values appropriately.
  2. Choose appropriate machine learning algorithms for the prediction task.
  3. Evaluate model performance using appropriate evaluation metrics.
  4. Optimize hyperparameters to improve model performance.
  5. Handle potential overfitting or underfitting issues.

Tasks:

  1. Task 1: Create function to load dataset

    • [x] Write a Python function to load the parking garage dataset from the CSV file into a pandas DataFrame.
  2. Task 2: Create function to shuffle dataset

    • [x] Implement a function to shuffle the rows of the dataset to ensure randomness in training and testing.
  3. Task 3: Normalize dataset

    • [x] Develop a function to normalize the numerical features of the dataset to ensure uniform scaling.
  4. Task 4: Train test split dataset

    • [x] Write a function to split the dataset into training and testing sets using a predefined ratio (e.g., 80% training, 20% testing).
  5. Task 5: Create models and train

    • [x] Implement machine learning models (e.g., regression, decision tree, random forest) and train them using the training dataset.
  6. Task 6: Create function to fine-tune models

    • [ ] Develop a function to fine-tune the hyperparameters of the machine learning models using techniques such as grid search or random search.
  7. Task 7: Create function to predict using model

    • [ ] Write a function to make predictions on the testing dataset using the trained model.
  8. Task 8: Save model to pickle

    • [ ] Implement a function to save the trained model to a pickle file for future use.
  9. Task 9: Create functions to visualize model performance

    • [ ] Develop functions to visualize model performance metrics such as accuracy, precision, recall, and F1-score using plots and charts.
sherrycodes1 commented 3 months ago

Hi!, I have a particular interest in machine learning and Data Visualisation, and this "issue"/project caught my eye. I'm wondering how can I contribute to this project of yours?