dsfsi / covid19za

Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
https://dsfsi.github.io/covid19za-dash/
MIT License
255 stars 200 forks source link

[Feature] Process Whatsapp Messages, compare to previous day and create CSV output #80

Closed vukosim closed 4 years ago

vukosim commented 4 years ago

Is your feature request related to a problem? Please describe. Currently, we are only getting numbers from the NICD/DoH in terms of final numbers. One place we can get this data is their Whatsapp information service that then gives daily numbers after the update.

We need a solution to check 2 Whatsapp updates, calculate the difference and create the CSV.

The Whatsapp messages are now stored in data/doh_whatsapp/ as .txt files

Describe the solution you'd like Process 2 consecutive Whatsapp .txt files and then output the CSV that has the

confirmed.csv template. Similar to the scraper.

Example from the Whatsapp line

image

extracted into .txt file example below

Current Status of Cases of COVID-19 in South Africa 24 MARCH 2020 - 11:28am

Total cases: 554 153 New cases 2 Full recovery (Confirmed Negative and cleared for returning home)
0 Deaths

The breakdown per province of total infections is as follows: 302 Gauteng 130 Western Cape 80 KwaZulu Natal 18 Free State
5 North West 9 Mpumalnaga 4 Limpopo 2 Northern Cape 2 Eastern Cape

Current projections estimate that the virus could effect 60% of South Africa's citizens at some point, but not at the same time. Most South Africans will only experience mild symptoms and humans are capable of developing immunity to the virus.

The National Department of Health will now be releasing results as they are submitted by both private and public laboratories. In instances where NDOH confirmatory tests yield different results, the public will be duly informed.

TEST RESULTS OF CITIZENS REPATRIATED FROM WUHAN: All the citizens from Wuhan were tested and their results came back negative for COVID-19. They will continue to be kept in quarantine for the prescribed period and will thereafter be reunified with the community.

vukosim commented 4 years ago

Hey @cishiv you might be able to help with this one.

cishiv commented 4 years ago

Sure. If no one else gets to it before me, I'll take a took @vukosim

cishiv commented 4 years ago

Will probably only be during the weekend though.

vukosim commented 4 years ago

NO problem @cishiv. lets get it right.

Ari-Ramkilowan commented 4 years ago

@vukosim , @cishiv what is the status of progress on this feature. I'm happy to assist, but don't want to duplicate effort. Let me know then I'll either get cracking on this ticket or pickup another issue \ feature from the repo

vukosim commented 4 years ago

Assigned @Ari-Ramkilowan you can pick it up.

Ari-Ramkilowan commented 4 years ago

@vukosim Progress on this feature ...

I have written some python code to extract date, province and infection count for each of the whatsapp text files from NDOH.

Whenever the notebook is executed, it will look inside the relevant data folder, if a previously unprocessed .txt file exists it will then extract the information from the infection count breakdown section of the WhatsApp and store it in a .csv.

I'm not 100% clear on how this data is to be used, so does it make sense to just create a .csv for every .txt file ? or is the aim to have one .csv that is continually updated ?. I think the former is my favoured approach ( even though I currently have a single csv with all the data extracted - it might become harder to maintain this approach in the long run though).

Let me know what you'd like from this feature, I'll tidy up the notebook and make a PR.

As an example of what we currently have available, performing a groupby on the data extracted, yields the ffg output by_province.get_group('KZN') Screenshot from 2020-04-13 17-43-32

vukosim commented 4 years ago

Ahh. Thanks. This will be very helpful @Ari-Ramkilowan

Ari-Ramkilowan commented 4 years ago

@vukosim . I Just updated the notebook to get the difference in infection count for any two given dates (for which data exists). sample output below : diff_by_dates(df, '2020-04-13','2020-04-10') gives Screenshot from 2020-04-15 12-20-43

Ari-Ramkilowan commented 4 years ago

@vukosim : PR sent - after it gets approved I'll hop onto the next issue

vukosim commented 4 years ago

Thanks @Ari-Ramkilowan