Lucas-Czarnecki / COVID-19-CLEANED-JHUCSSE

Cleaned daily reports and time series data from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins University for Systems Science and Engineering (JHU CSSE).
12 stars 6 forks source link

Code for cleaning? #10

Closed rchurt closed 9 months ago

rchurt commented 4 years ago

This is great, thanks very much for putting this together. If you wouldn't mind, could you share the code you use for doing the cleaning, perhaps as an .ipynb file?

Lucas-Czarnecki commented 4 years ago

My pleasure! I can do that. The documentation will be R-related but not .ipynb. I'll try to have something up soon (it's been on my TO DO list for a while).

rchurt commented 4 years ago

Thanks very much, that would be great. Happy to help throw it in a Jupyter Notebook so power users will love you.

Lucas-Czarnecki commented 4 years ago

In case you missed it, I uploaded the scripts I use to clean JHU's data. You can find the code here. I'm still planning on uploading one additional script to document all data cleaning operations (dating back to January 22, 2020). If you are still interested in providing a .ipynb I'll gladly welcome your contribution.

rchurt commented 4 years ago

This is great--thanks very much for adding this. I just created a notebook and opened a pull request. The only issue is that the notebook doesn't reference the R scripts you just uploaded here, which means that if you want to change the R code, you'll have to change it in two places (in the notebook and in the .R files you uploaded to GitHub). It's up to you how you'd like to handle this, but if it were me, I'd delete the folder with the .R scripts you just uploaded and make all changes in the notebook. The other options are to either update them in both places or to just update the .R files as you have been, and make a note in the notebook that users who want the most up-to-date code should update the notebook based on the scripts in the folder.

Hope this helps, and let me know if you run into any problems with it.

Lucas-Czarnecki commented 4 years ago

Thank you Rob, this looks great! For the time being the .ipynb is in the notebooks folder. I don't think I will delete the .R files as the project initially began with R users in mind. I myself prefer to work with R Studio. I'm open to making changes in the future. If this project is of interest to you, let me know. I'm happy to include you as a contributor.

rchurt commented 4 years ago

Great, glad you find it helpful. And certainly, do what you're comfortable with. Just wanted you to know about the redundancy.

I work almost exclusively in Python, so I'm not sure how helpful I'll be if you're mostly working in R, but I'm happy to help out with specific issues if they come up.

Lucas-Czarnecki commented 9 months ago

All necessary scripts for processing data can be found in the repo's script folder HERE.