UBC-MDS / beijing_air_quality_analysis

https://ubc-mds.github.io/beijing_air_quality_analysis/
Other
2 stars 8 forks source link

Milestone 1 feedback #77

Open andytai7 opened 2 years ago

andytai7 commented 2 years ago

2. Project set-up: Mechanics Comments Who's on your team?

Don't need both MD and RMD read me, only one. Usually just keep the MD.

3. Project proposal: reasoning Comments "Sub Exploratory Questions" As a suggestion, maybe put objectives or aims and rephrase them into statements? For example, it's confusing what sub-questions are.

What are the proposed methods?

How will you clean the data if it has missing values?

What if there are class imbalances? Will you create synthetic data?

What about the other measurements, is there a reason you only picked one?

For your hypothesis test, are you only utilizing values from 2013 and 2017? If so, then the visualization of the rest of the years in between is sort of useless. Would it be more interesting if you did it year by year and a range to show progress?

It looks like you do do this "March 1 2013 to Feb 28 2015, and March 1 2015 to Feb 28 2017" But you don't say in the project description

Lastly, for this hypothesis test, is there no control? What about comparing it with other countries? you won't know what the global rate of increase is, and without knowing that, how would you know if the whole world was increasing and not just Beijing?

4. A script that downloads the data: Accuracy Comments I need to grab the CSV file myself.

4. A script that downloads the data: Quality Comments We are missing the csv with everything joined in the table.

5. Exploratory data analysis in a literate code document: VIZ Comments Need the correct labels, i don't know what Time A and Time B are.

5. Exploratory data analysis in a literate code document: REASONING Comments Could have utilized more plots

Jacq4nn commented 2 years ago

Hi Andy. Thank you for your feedback. We have discussed your comments within our group, and we have several queries.

For our understanding, we are working on the project not just EDA. Therefore, we assume that data is downloaded from the script and we put different details in ReadMe & EDA. We lost marks in

2. Project set-up: Mechanics
Who's on your team?
Don't need both MD and RMD read me, only one. Usually just keep the MD.

5. Exploratory data analysis in a literate code document: VIZ
Need the correct labels, i don't know what Time A and Time B are.

There isn’t a clear instruction or in the industry on where we put our name. We have our names in ReadMe, time A and B are mentioned in the ReadMe and EDA, and we took reference on the breast cancer project, it is the same structure. (Name in ReadMe but not Rmd and have MD and RMD files in the repo) https://github.com/ttimbers/breast_cancer_predictor/blob/v2.0/src/breast_cancer_eda.md

A script that downloads the data: Accuracy
I need to grab the CSV file myself.

A script that downloads the data: Quality
We are missing the csv with everything joined in the table.

We have the python script to download and unzip the files and put it in the dedicated folder, where the instruction is written in ReadMe, tested on our computer and the script works. On top of that, we also have the csv in our repo. But it looks like TA just run the Rmd script?

3. Project proposal: reasoning
What are the proposed methods?

How will you clean the data if it has missing values?

What if there are class imbalances? Will you create synthetic data?

What about the other measurements, is there a reason you only picked one? 

For your hypothesis test, are you only utilizing values from 2013 and 2017? If so, then the visualization of the rest of the years in between is sort of useless. Would it be more interesting if you did it year by year and a range to show progress?

It looks like you do do this "**March 1 2013 to Feb 28 2015**, and **March 1 2015 to Feb 28 2017**" But you don't say in the project description 

Lastly, for this hypothesis test, is there no control? What about comparing it with other countries? you won't know what the global rate of increase is, and without knowing that, how would you know if the whole world was increasing and not just Beijing?

5. Exploratory data analysis in a literate code document: REASONING
Could have utilized more plots

5. Exploratory data analysis in a literate code document: ACCURACY
(No comment?)

We have detailed explanation on ReadMe about methods therefore a brief explanation in EDA but focused on the data, also, we followed the instruction on, e.g. visualisation https://pages.github.ubc.ca/mds-2021-22/DSCI_522_dsci-workflows_students/materials/assignments/milestone1.html#project-proposal

@flor14 @andytai7

flor14 commented 2 years ago

Hello! I have a look at your project. Congratulations for the work so far. I noticed that you got 84,55% for this milestone, which means that you have done quite good work with it.

Reproducibility is hard, something that runs in one computer could not run in other. I think that @andytai7 grades are appropriate for this stage of the project. My recommendation is to use the feedback to improve your work as much you can.

zjr-mds commented 2 years ago

@flor14 Hello Florencia, I actually tested out script on downloading the data with the first release files, and it ran perfectly fine and the script created the raw data folder successfully. I have recorded my screen on the whole process from downloading first release files from github.com and open the script to run the download command line (given in Usage section) @andytai7 I could forward the screen recording to both of you on Slack if you would like to take a look on that. Thank you.

vtaskaev1 commented 2 years ago

Thank you Andy for your feedback: