Petrichor12 / Coronavirus

Analysis on the novel coronavirus data from worldometers.
2 stars 0 forks source link

Develop a structure to manage all elements of the repository #8

Closed gokeplerride closed 4 years ago

gokeplerride commented 4 years ago

scraping, data engineering, analytics, visuals, automation scripts etc.

Petrichor12 commented 4 years ago

What do you mean by this exactly?

gokeplerride commented 4 years ago

It's my impression that it would make sense to break things down into manageable pieces that work together. Correct me if I'm wrong ....

Petrichor12 commented 4 years ago

You mean like having different files/folders or so? Yeah we need some more structure for sure. How about having all the visuals in one file so it's easier to clear output when we commit changes?

Dan-OKeeffe commented 4 years ago

yeah i agree it would be good to define. i'm not sure the best way of doing it

Petrichor12 commented 4 years ago

I think we could definitely clean up the 'create dfs' file and perhaps optimize that in a few loops - I was quite lazy with having a new block for every country. I'd say we try do all that in a loop or too and then add it to the get_data file, so that every time we grab new data it is then placed into the appropriate dfs within a single script. What do you guys think?

Second idea would be adding functions to all the analysis work so far, and then creating a second script "Visualisations" where we import 'Analysis', call all the functions and produce the graphs. Then we only have to worry about one file for visualisations and it might be cleaner. Not sure if that is good practice or not?

gokeplerride commented 4 years ago

Makes total sense!!!!

Petrichor12 commented 4 years ago

OK first structural changes/tidy ups I think need to be done:

  1. Move all data prep to the get_data file if possible or at most have a second file with data prep. Pulling the data doesn't take too long so I don't think it's too bad to have it in one file. Then we can call it in other files and it is way easier to follow code.
  2. We have a lot of duplicate graphs and all sorts of graphs in different files and with different libraries. Would be good to update them all into a few files and delete the duplicates. What do you guys think about: scatter plots maps line plots other plots simulation
gokeplerride commented 4 years ago

Yeah agreed, we should also make a difference btween animations, actuals and forecasts though

Petrichor12 commented 4 years ago

Thats a good idea. So you mean having say one file for scatter plots and having then multiple sections in the file for static, animation and forecasting? Or rather having one file for animations and having mixed plot types in there?