DSCI-310 / data-analysis-review-2021

0 stars 1 forks source link

Submission: 10: Investment Outcome Predictor #10

Open ttimbers opened 2 years ago

ttimbers commented 2 years ago

Submitting authors: @nkoda @mahdiheydar @izk20 @harrysyz99

Repository: https://github.com/DSCI-310/DSCI-310-Group-10

Abstract/executive summary: The KNN-Classification model was applied to 2017 Canadian census data to predict whether an individual made money on their investments (true class) or broke even or lost money (false class) using their family size, and whether they are the major income earner in their family as features.

All investments contain a risk, so the rationale for this analysis was to gain insight into whether the pressures of being the main income earner in a family and having a larger family size have influence on predicting someones investment outcomes. This could be used to further analyze the risks associated within the specific investments for further analysis.

The KNN-model was tuned for the nearest neighbors hyperparameter. A value of 26 was used yielding approximately 57% accuracy. Therefore, the model did not perform much better compared to a dummy classifier. The KNN-classification model was not able to distinguish between individuals in the same family size group unlike the pattern found in the actual data.

It is important to build other models such as a support vector machine model (SVM), or carry out feature engineering or add other features that may serve as better predictors to gain more solid results. This will enhance the investigation of the original research question of how family size, and whether an individual is the major income earner in their family, can be used to predict investment outcomes.

Editor: @ttimbers

Reviewer: @YellowPrawn @ClaudioETC @isabelalucas @Jaskaran1116

YellowPrawn commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing:

1 Hour

Review Comments:

Overall, this project is very well written and covers all essential bases. As per the comments in the above, most of the issues that I have spotted in your project are very minor and can be fixed relatively quickly. It is interesting that you have decided to use R markdown as a way of rendering your report, maybe it would've been better to do it in Jupyter book? maybe it wouldn't. Well done!

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

isabelalucas commented 2 years ago

Reviewer: Isabela Lucas Bruxellas

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 3 Hours

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Things that were done particularly well:

Thing to improve

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

Jaskaran1116 commented 2 years ago

Data analysis review checklist

Reviewer: Jaskaran1116

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 1 hour 15 minutes

Review Comments:

Components that are constructed well

Components to improve on

Overall, great job! You guys have adhered to the guidelines and have created a very well structured project. I feel the suggestions are just some minor changes to the repository and can be fixed quickly. I, also, liked that you guys have used R makrdown to render your report since it allows the results of R code to be directly inserted into formatted documents.

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.

ClaudioETC commented 2 years ago

Data analysis review checklist

Reviewer:

Conflict of interest

Code of Conduct

General checks

Documentation

Code quality

Reproducibility

Analysis report

Estimated hours spent reviewing: 2 hrs.

Review Comments:

Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.

Well made points

I think it is a good project which lands well on the computational side. I found the functions and testing to be well developed and carefully thought. I think the project is well structured and the idea of the project was solid from the beginning. It seems like a project where everyone worked in a fluent way which lead to a project which does not seem as a "glue" of parts. The project is easy to deploy thanks to the well done README file, while the makefile was well developed with no errors in the process. The conclusions are solid. Overall I think the observations that have been mentioned will help improve to a really good research paper (if it were the case).

Points to improve

Attribution

This was derived from the JOSE review checklist and the ROpenSci review checklist.