UBC-MDS / Credit_Card_Default_Prediction_Group13

Group 13 of DSCI 522 in the year 2022-23
https://ubc-mds.github.io/Credit_Card_Default_Prediction_Group13/doc/report.html
Other
2 stars 3 forks source link

Milestone 2 - Feedback #56

Closed danielramandi closed 1 year ago

danielramandi commented 1 year ago

Congratulations on finishing milestone 2! We can see you put a lot of work into this project, nice work! Below we list some specific feedback you can use to improve your project. We provide tick boxes for you to use in the future as you address these concerns to improve the final grade of your project. If anything is unclear, please feel free to ask questions in this issue thread.

1. Analysis code and scripts

Comment: The usage section does not provide proper path to the scripts. It should be “/src/script.py”. Also, is there a reason that you are specifically suggesting to run the scripts with python3? That can cause issues across platform. Either clarify why, or just use python. (I do not have python3 as a cmdlet.)

Comment: overall, please try to add more documentation to your codes. The author and date are necessary.

Comment: Refer to the Milestone 2 submission guidelines for details on proposal.md file.

Comment: The reports in the repository do not have figure/table captions or titles. Also, the Rmd file can not be rendered. In the Rmd file, the figures have captions, but not the tables.

Comment: Use inline variables to report the scores in the text of the report.

Comment: add appropriate exception errors and check for input values. Some parts of the functions are unclear, add documentation in the beginning about what exactly the function does. For example, the function that train a ml algorithm (also the script), needs some documentation in the beginning of the code, mentioning what models are being trained.

Comment: some functions do not have proper tests, or not documented. You can use unit testing to solve this. Here’s a nice article on unit testing. But simply, you can check for input type discrepancies and raise proper errors manually.

2. Analysis can be run reproducibly using documentation provided in the README

Comment: vl-convert was not part of the reported dependencies, but was required. Also, the name of the packages should be exactly the same as what would be needed to install them. e.g: docopt-ng, pandas-profiling

Comment: There is no mention of how to render the final report. I tried Rscript command to render it, but it raised an error.

3. Submission expectations

hcwang24 commented 1 year ago

@flor14 @danielramandi

On behalf of all authors, thank you for your detailed evaluation and valuable feedback on our project “Credit_Card_Default_Prediction_Group13” release 0.1.0 (https://github.com/UBC-MDS/Credit_Card_Default_Prediction_Group13/tree/v0.1.0). It was really an amazing experience building our first team project together on GitHub and we hoped to make the project something we can put on our CV or describe to an interviewer. However, we have carefully reviewed your comments and felt discouraged by the marking. Also, given that our milestone 1 feedback came 5 days past the MS2 deadline, some issues from the MS1 carried over to this release and had marks deducted. Here, we are providing a point-to-point response to further explain some of the choices made/circumstances:

“Congratulations on finishing milestone 2! We can see you put a lot of work into this project, nice work! Below we list some specific feedback you can use to improve your project. We provide tick boxes for you to use in the future as you address these concerns to improve the final grade of your project. If anything is unclear, please feel free to ask questions in this issue thread.”

Thank you very much for the words of encouragement. As stated above, we really were interested in carrying out this project.

1. Analysis code and scripts

Comment: The usage section does not provide proper path to the scripts. It should be “/src/script.py”. Also, is there a reason that you are specifically suggesting to run the scripts with python3? That can cause issues across platform. Either clarify why, or just use python. (I do not have python3 as a cmdlet.)

We have changed the relative paths for the script for our newest release. At the time of the first release, it was documented to run the code under the src folder (however we admit that this is not very clear). Also, this was already deducted in the milestone 1 issue that was given to us 5 days passed the deadline for milestone 2 submission. We feel that this mark deduction is unfair since we are not able to make corresponding changes before the submission deadline (https://github.com/UBC-MDS/Credit_Card_Default_Prediction_Group13/issues/44).

To answer the specific use of Python3, Python 2 is no longer in use since 2020, and because Python 3 and Python 2 do not share the exact same language syntax, many popular packages are developed in Python 3 instead of Python 2. Hence, we wanted to be specific about the Python version used to avoid additional problem during installation, since environment was a required part of this release. We have included a sentence in the Readme about the usage of Python 3. We are wondering if the lost marks on this part can be rewarded?

Comment: overall, please try to add more documentation to your codes. The author and date are necessary.

We have added the author, date, and general description to all our scripts. Thank you.

Comment: Refer to the Milestone 2 submission guidelines for details on proposal.md file.

We felt that this mark deduction is unfair since the proposal.md is optional in milestone 2 expectations. To quote the rubric: “Yes, we are asking you to overwrite your proposal here, and so if you want to keep that in a more findable place than in the Git history, do create a new file in the doc directory to archive it there (e.g., proposal.md).”

Comment: The reports in the repository do not have figure/table captions or titles. Also, the Rmd file can not be rendered. In the Rmd file, the figures have captions, but not the tables.

The report rendered in three out of four computers in our group. However, we had identified and fixed a bug causing the rendering problem (in importing a correlation matrix and formatting using kable()). All figures and tables included captions, this can be observed on line 125 (https://github.com/UBC-MDS/Credit_Card_Default_Prediction_Group13/blob/v0.1.0/doc/report.Rmd).

Comment: Use inline variables to report the scores in the text of the report.

Thank your for catching this problem. We have included vectorized inline variables in our most-recent release.

We have missed this requirement. Can you kindly point to us where it is stated to include the tests? Nevertheless, we have added tests using “assert” in our current release.

Comment: add appropriate exception errors and check for input values. Some parts of the functions are unclear, add documentation in the beginning about what exactly the function does. For example, the function that train a ml algorithm (also the script), needs some documentation in the beginning of the code, mentioning what models are being trained.

Thank you pointing this out. We have been deducted marks for this in one of your earlier comments. Also, regarding function annotations, we followed the style in the example repo by Prof. Timbers (https://github.com/ttimbers/breast_cancer_predictor/tree/v2.0/src) and annotated each major step of the script. We are wondering if this mark deduction can be reversed?

Comment: some functions do not have proper tests, or not documented. You can use unit testing to solve this. Here’s a nice article on unit testing. But simply, you can check for input type discrepancies and raise proper errors manually.

As responded to one of the earlier comments, we have added tests using “assert” in our current release. We are wondering if this mark deduction can be reversed since it’s already deducted in your earlier comment?

2. Analysis can be run reproducibly using documentation provided in the README

Comment: vl-convert was not part of the reported dependencies, but was required. Also, the name of the packages should be exactly the same as what would be needed to install them. e.g: docopt-ng, pandas-profiling

Thank you for mentioning this. We have added the environment.yaml file for easy-installation in this release. However, we felt the mark deducton is unfair since we received mark deduction in milestone 1 but the feedback was 5 days passed the deadline for milestone 2 (https://github.com/UBC-MDS/Credit_Card_Default_Prediction_Group13/issues/44).

Comment: There is no mention of how to render the final report. I tried Rscript command to render it, but it raised an error.

We have fixed the bug that’s causing kable to not render properly in one of the tables. Thank you for pointing this out. However, we are wondering how much total points are assigned for this requirement and if a 5-points deduction is too much?

3. Submission expectations

Thank you for pointing this out. We have fixed this in our recent submission. We were confused by the requirement and thought it was asking us to submit the release note page. The release 0.1.0 can be accessed here https://github.com/UBC-MDS/Credit_Card_Default_Prediction_Group13/tree/v0.1.0