Title and Abstract(0.75/0.75) - Good - add which classification methods/algorithms you plan to explore.
Background (1/1) -
Good!
I won't take off points this time but make sure to add citations in the proper format for the prior studies mentioned.
Problem Statement (1/1) -
Good - make sure to specify which classifiers you plan to explore.
Data (1.25/1.25) - OK.
Proposed Solution (1/1.25)-
Just keep the Algorithms Section here - and clearly state which algorithms you plan to use for your project then add details regarding train/test split, cross-validation, and grid-search hyperparameters for each model. Are you going to compare and contrast 4/5 models? What is your first model? Some parts of this section are still unclear to me. (-0.25)
We do not need the Data Cleaning/transformation section here - this should be under preliminary results.
Metrics (1.25/1.25) -
OK.
Preliminary Results (1.5/1.5) -
Good! Just add markdown cells between each significant cleaning step and make a summary paragraph after EDA summarizing what you learn about the distribution of your data. Might be useful to add a confusion matrix to analyze correlation between your variables as well.
~Missing. (-1.5)
I can see that you've described some preliminary data cleaning in your proposed solution section but I need to see the code with which you cleaned these features/determined your dataset was ready to go. Why didn't you include it in your checkpoint? What about EDA? What is the distribution of your data?~
These were our expectations at this stage of the project -
Analyzing the suitability of a dataset or algorithm for prediction/solving your problem
Performing feature selection or hand-designing features from the raw data. Describe the features available/created and/or show the code for selection/creation
Perform Data Cleaning and explain steps taken OR include an explanation as to why data cleaning was unnecessary (how did you determine your dataset was ready to go)
Dataset is actually clean and usable after feature selection is carried out
Showing the performance of a base model/hyper-parameter setting. Solve the task with one "default" algorithm and characterize the performance level of that base model.
Learning curves or validation curves for a particular model
Tables/graphs showing the performance of different models/hyper-parameters
Since you plan to compare/contrast 4 models - I suggest you get started on data cleaning/transformation/EDA as soon as possible + include the code to ensure you have sufficient time to finish the project.
Ethics and Privacy (0.5/0.5) -
OK. Keep adding to this section as you discover any more potential biases in your data/any confounds. How could your project be perceived? Address this.
Team Expectations (0.25/0.25).
Timeline (0.25/0.25) .
Other Comments - Good start - but I need to be able to see your data cleaning code to make sure you're on the right track. Please re-commit your code either in this notebook or somewhere else in your repo.
You can reply to this feedback below. Contact me anytime if you want help improving your project or have any questions at all!
Project Checkpoint Grade: 7.25+1.5 = 8.75/9
Title and Abstract(0.75/0.75) - Good - add which classification methods/algorithms you plan to explore.
Background (1/1) - Good! I won't take off points this time but make sure to add citations in the proper format for the prior studies mentioned.
Problem Statement (1/1) - Good - make sure to specify which classifiers you plan to explore.
Data (1.25/1.25) - OK.
Proposed Solution (1/1.25)- Just keep the Algorithms Section here - and clearly state which algorithms you plan to use for your project then add details regarding train/test split, cross-validation, and grid-search hyperparameters for each model. Are you going to compare and contrast 4/5 models? What is your first model? Some parts of this section are still unclear to me. (-0.25) We do not need the Data Cleaning/transformation section here - this should be under preliminary results.
Metrics (1.25/1.25) - OK.
Preliminary Results (1.5/1.5) - Good! Just add markdown cells between each significant cleaning step and make a summary paragraph after EDA summarizing what you learn about the distribution of your data. Might be useful to add a confusion matrix to analyze correlation between your variables as well. ~Missing. (-1.5) I can see that you've described some preliminary data cleaning in your proposed solution section but I need to see the code with which you cleaned these features/determined your dataset was ready to go. Why didn't you include it in your checkpoint? What about EDA? What is the distribution of your data?~
These were our expectations at this stage of the project -
Since you plan to compare/contrast 4 models - I suggest you get started on data cleaning/transformation/EDA as soon as possible + include the code to ensure you have sufficient time to finish the project.
Ethics and Privacy (0.5/0.5) - OK. Keep adding to this section as you discover any more potential biases in your data/any confounds. How could your project be perceived? Address this.
Team Expectations (0.25/0.25).
Timeline (0.25/0.25) .
Other Comments - Good start - but I need to be able to see your data cleaning code to make sure you're on the right track. Please re-commit your code either in this notebook or somewhere else in your repo.
You can reply to this feedback below. Contact me anytime if you want help improving your project or have any questions at all!