Open ttimbers opened 2 years ago
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Functionality documentation: Each script file is very straightforward, which lets the reader easily interpret what each script does. This is also the case for each function as they are all well documented. The only thing that would make the documentation stronger, in my opinion, is to add a reference to the polyfit function for NumPy.
Automation: There are some issues with reproducing the analysis. When using the docker image, there is an error with running the makefile. I can't properly run the analysis using the procedure listed readme.md. I get the following error so I have to manually run each script in the makefile.
(base) jovyan@5ffe8a43bc2b:~$ cd work
(base) jovyan@5ffe8a43bc2b:~/work$ make clean
bash: make: command not found
(base) jovyan@5ffe8a43bc2b:~/work$
Also, the lack of a data folder in the repository means we have to create one manually (granted that isn't too hard to do) to run the script.
However, there is also a missing dependency for the docker image. Jbook is not in the dockerfile, but can be easily fixed with an extra import in the dockerfile
Figure 5, shows a positive correlation between hours worked and level of education.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
In general, the work is impressive with visible effort spent on this project. The function documentations are remarkable, and the script arrangement is well organized. However, there are still some aspects that need improvement:
Please consider provide a method to contact you. Currently, there is no clear way of contacting you for collaboration and contribution. I know this was not clearly required in the project description of the previous milestones, but I think it is nice to have. Very unfortunately, GitHub does not have a chat feature which trivializes the communication between developers.
You did not specify the development tools you used for this project. As your classmate, I personally know that you used Docker, JupyterLab, and maybe VS Code. However, they are not obvious to the people outside of this course. More importantly, you did not specify the python version for this project in obvious place, and that is fatal. I understand that your docker has the environment, but the detailed dependency list is still highly recommended.
In your report, you did not discuss your approach to the problem you raised, and also did not explain why you choose the machine learning model used in the project. You may want to briefly justify your choice of model. Also, I want to comment on the readability of your report. Your report still contains code, and some images are not loaded when I view the report on GitHub. You may want to export a pdf file in addition to the ipynb file you currently have so that others can conveniently see your work without the need to go through the entire configuration process to run your proejct.
Overall, your work is great!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
The README.md
provides clear and thorough instructions for running the project. However, it could be better if it is formatted more nicely. For example, using tables to present the dependencies, using in-line code block for the path of the final report, adding headers to differentiate the steps to run the project and the test, adding line breaks to make the license information stand out.
The report is nicely written and rendered. I suggest the following minor changes:
Summary
, Introduction
, etc. should not be subheaders under Authours
.The citation in the report is done in a clean manner. It is a good idea to reference the programming language and dependencies used for this analysis in the report.
Team-Work-Contract.pdf
should not be included.
The /data
directory contains raw data, processed data and plots generated by the scripts. It is better to make sub directories like /data/raw
and /data/processed
to organize the data, and a /results
directory to store the plots.
The following error occurred when I run the make file on the docker container:
OSerror:"Cannot save file into a non-existent dicectory: data"
I solved this by manually add a \data
directory. I would suggest debugging a little bit in the makefile.
There is no analysis part of the results, a non-expert would be confused if there is no explanation of the result. I'd also recommend including further explanations in the final report to make it more understandable. If you could explain a little more in the result section, I think it would be easier for the audience that isn't an expert in Data Science and Statistics to grasp what you've done.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @rlaze @YellowPrawn @alexkhadr
Repository: https://github.com/DSCI-310/DSCI-310-Group-3
Abstract/executive summary: This project looks into US 1994 census data to investigate the effects of higher education, and hours worked. We come up with a predictive model that estimates the amount of hours worked per week based on the amount of years of education.
Editor: @ttimbers
Reviewer: @danielhou13 @overcast-day @ZiyueChloeZhang @lizhe918