AaronWard / covidify

Covidify - corona virus report and dataset generator for python 📈 [no longer being updated]
MIT License
446 stars 113 forks source link

Data get mixed #44

Closed rdorsch closed 4 years ago

rdorsch commented 4 years ago

Hi Aaron,

I just noticed, if I do two runs after each other, e.g. one for Germany, then one for Austria, I get plots from Germany in the Austria excel file.

I copy the covidify-test output directory to

http://bokomoko.de/~rd/covidify-test/

Here are my runs:

(covidify) rd@h370:~/virtualenv$ covidify run --source JHU --output ~/tmp.nobackup/covidify-test --country Germany MESSAGE: No top countries given, defaulting to top 10

Job arguments:

... ENV: /home/rd/virtualenv/covidify/lib/python3.7/site-packages/covidify ... OUTPUT FOLDER: /home/rd/tmp.nobackup/covidify-test ... DATA SOURCE: JHU ... COUNTRIES: Germany ... TOP INFECTED COUNTRIES: 10 ... FORECAST PERIOD: 10

Data Extraction

Creating folder... ... /tmp/corona/ Cloning Data Repo... Getting sheets... ... loading data: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 107/107 [00:06<00:00, 16.57it/s] Country specified! ... filtering data for Germany ... Calculating dataframe for new cases Calculating data for logarithmic plotting... ... Germany: 169430 Creating subdirectory for data... ... /home/rd/tmp.nobackup/covidify-test/data/2020-05-08 Saving... ... agg_data_2020-05-08.parquet.gzip ... agg_data_2020-05-08.csv ... trend_2020-05-08.csv ... log_2020-05-08.csv Done!

Training Forecasting Model

Training forecasting model... ... train/test split: 0.95 ... RMSE: 482.349274870671 ... forecasting 10 days in the future ... saving file: forecast_2020-05-08.csv ... saving graph

Data Visualization

Importing Data... Creating graphs... ... Time Series Trend Line ... Daily Figures ... Daily New Infections Differences ... Logarithmic plots Creating excel spreadsheet report... ... reading images for: log ... reading images for: forecasts ... reading images for: bar ... reading images for: trendline Done!

Complete!

Desktop (please complete the following information): (covidify) rd@h370:~/virtualenv$ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 10 (buster) Release: 10 Codename: buster (covidify) rd@h370:~/virtualenv$ python -V Python 3.7.3 (covidify) rd@h370:~/virtualenv$

Many thanks for maintaining covidify :-)

Rainer

AaronWard commented 4 years ago

Okay i see what is happening, when you specify a folder no deletion of old files occurs. This only happens when you use the default folders that covidify creates in the desktop directory (by ommiting the --output flag).

The reason why this is is because i want to be very careful about deleting files that may be laying around in someones work folders just incase it deletes something important, and someone loses all their work

You can separate the results for each country by omitting the --output, or just choosing a different folder for each

Examples:

- covidify run --country="Germany"
- covidify run --country="Austria"

or

- covidify run --country="Germany" --output= ~/tmp.nobackup/covidify-test-germany
- covidify run --country="Austria" --output= ~/tmp.nobackup/covidify-test-austria

Many thanks for maintaining covidify :-)

You're very welcome, thanks for using Covidify

rdorsch commented 4 years ago

Hi Aaron,

thanks for your quick reply.

Am Freitag, 8. Mai 2020, 23:10:46 CEST schrieb Aaron:

Okay i see what is happening, when you specify a folder no deletion of old files occurs. This only happens when you use the default folders that covidify creates in the desktop directory (by ommiting the --output flag).

The reason why this is is because i want to be very careful about deleting files that may be laying around in someones work folders just incase it deletes something important, and someone loses all their work

I do not fully follow the scenario. Which file you are afraid of to overwrite? It seems you are at least overwriting the _* files (which is good for me).

A warning would have been useful for me in this case. And maybe a force overwrite option.

You can separate the results for each country by omitting the --output, or just choosing a different folder for each

Examples:


- covidify run --country="Germany"
- covidify run --country="Austria"

Hmm.... the help says

(covidify) rd@h370:~/virtualenv$ covidify run --help Usage: covidify run [OPTIONS]

Generate reports for global cases or refine by country.

Options: --output TEXT Folder to output data and reports [Default: /Users/rd/Desktop/covidify-output/]

[...]

but this directory neither exists nor am I allowed to create this as user. I did not try it, because it seemed to be save to choose a well defined directory.

or

  • covidify run --country="Germany" --output= ~/tmp.nobackup/covidify-test-germany - covidify run --country="Austria" --output= ~/tmp.nobackup/covidify-test-austria ```

Yes, that would be a workaround.

But I think your explanation is not complete. If I look at the image directory I see

rd@h370:~/tmp.nobackup/covidify-test/reports/images$ ls -l insgesamt 1136 -rw-r--r-- 1 rd rd 70892 Mai 8 22:51 Austria_confirmed_cases_stacked_bar.png -rw-r--r-- 1 rd rd 82163 Mai 8 22:50 Austria_confirmed_trendline.png -rw-r--r-- 1 rd rd 54976 Mai 8 22:51 Austria_currently_infected_bar.png -rw-r--r-- 1 rd rd 57728 Mai 8 22:50 Austria_new_confirmed_cases_bar.png -rw-r--r-- 1 rd rd 117222 Mai 8 22:51 Austria_new_confirmed_cases_trendline.png -rw-r--r-- 1 rd rd 53354 Mai 8 22:50 Austria_new_deaths_bar.png -rw-r--r-- 1 rd rd 54344 Mai 8 22:50 Austria_new_recoveries_bar.png -rw-r--r-- 1 rd rd 52687 Mai 8 22:51 confirmed_log.png -rw-r--r-- 1 rd rd 55569 Mai 8 22:50 cumulative_forecasts.png -rw-r--r-- 1 rd rd 76026 Mai 8 22:50 Germany_confirmed_cases_stacked_bar.png -rw-r--r-- 1 rd rd 78510 Mai 8 22:50 Germany_confirmed_trendline.png -rw-r--r-- 1 rd rd 64255 Mai 8 22:50 Germany_currently_infected_bar.png -rw-r--r-- 1 rd rd 64044 Mai 8 22:50 Germany_new_confirmed_cases_bar.png -rw-r--r-- 1 rd rd 123770 Mai 8 22:50 Germany_new_confirmed_cases_trendline.png -rw-r--r-- 1 rd rd 58140 Mai 8 22:50 Germany_new_deaths_bar.png -rw-r--r-- 1 rd rd 59917 Mai 8 22:50 Germany_new_recoveries_bar.png rd@h370:~/tmp.nobackup/covidify-test/reports/images$

I do not see reason why Germany_new_confirmed_cases_trendline.png goes into Austria_report....xlsx, since there is also a Austria_new_confirmed_cases_bar.png file which would be the correct one.

I see conflicts in confirmed_log.png and cumulative_forecasts.png, but why do not you give them also the country prefix, then there is no confusion anymore?

Many thanks for maintaining covidify :-)

You're very welcome, thanks for using Covidify

Thanks again Rainer

-- Rainer Dorsch Beatus-Widmann-Str. 5 72138 Kirchentellinsfurt 07157/734133

AaronWard commented 4 years ago

The reason why is this function

The code looks for the file names and adds them to the respective tabs in the excel. For example xxx_xxx_bar.png gets added to the bar chart tab. It doesn't check for country names in the file names. But i shall add that in the code when i get the chance.

Thanks for the submission, and thanks for using covidify!

Closing this issue