dsfsi / covid19za

Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
https://dsfsi.github.io/covid19za-dash/
MIT License
255 stars 200 forks source link

[DATA] Help needed for Hospital Data #115

Open HerkulaasCombrink opened 4 years ago

HerkulaasCombrink commented 4 years ago

Which Dataset

health_system_za_public_hospitals.csv

Error Description

District and subdistrict data needed Estimated population size needed for each district

Suggested fixes

  1. Populating the data for the proposed file.
  2. Creating an accurate dataset that is already in a computer-readable format, and not in a PDF etc.
    1. Finding an updated Private and public Hospital repo for each South African province.

Volunteer to fix the data

Choose the data you want to fix/add and volunteer to the data you want to commit to https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

anelda commented 4 years ago

@MikeMcMalace the humanitarian data exchange has pop size at various admin levels for SA - https://data.humdata.org/dataset/south-africa-administrative-levels-0-3-population-statistics. It was last updated in 2018 according to metadata.

I'd be interested to help with this. Also involved in https://afrimapr.github.io/afrimapr.website/blog/2020/healthsites-app/ and we've just started to work with healthsites.io as well. Let me know how I can help?

elolelo commented 4 years ago

@anelda we are currently working on a map visualization that is a bit similar to the one shown in your last link. For now ,most helped needed is on the data - populating the columns with

The data file is the one that @MikeMcMalace has identified when he opened this issue.

anelda commented 4 years ago

Three questions:

  1. Do you have a way to prevent different people working on the same thing for this? e.g. I can get webpages for hospitals but it would be tragic if others are working on this at the same time, duplicating effort.

  2. What is the relationship between health_system_za_public_hospitals_extended_details.csv vs health_system_za_public_hospitals_contacts.csv vs health_system_za_public_hospitals.csv? Can these be merged?

  3. Is there any value in contacting info@sadoctors.co.za who maintains this website - http://doctors-hospitals-medical-cape-town-south-africa.blaauwberg.net/hospitals_clinics_state_hospitals/state_public_hospitals_clinics_eastern_cape_south_africa/ (for each province with a lot of the data we need for each hospital) to hear if they can do a data dump of the data displayed on their website?

HerkulaasCombrink commented 4 years ago

Thank you so much for your inputs, @anelda .

1) I propose that we volunteer on this issue so that there isn't overlap. Alternatively, we can create a google doc and people can volunteer from there? - which would you think would work best?

2) Yes, they can. From the start, we needed details and information, and as time continued, the datasets expanded. We have a hospital dictionary, and I can imagine that we do not have all the IDs of all hospitals on this list. If I had to be pragmatic about it, I would propose that we update the library file, and then use that as a reference to see what we do not have.

3) Yes, there is. I have made contact with a few private hospital groups, and have reached out to provincial managers, but unfortunately, I have had little success. It is an excellent suggestion. Would you mind making contact?

anelda commented 4 years ago

For hospital beds, there is this study:

Geographical maldistribution of surgical resources in South Africa: A review of the number of hospitals, hospital beds and surgical beds A J DellI; D KahnII (IBSc, MB ChB, PhD; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa; IIMB ChB, FCS (SA), ChM; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa) http://dx.doi.org/10.7196/samj.2017.v107i12.12539 published 2017 with a contact email for the lead author angelajdell@gmail.com

Maybe they can share the data they collected - here is how they did it (a lot of work have gone into collecting/verifying the data)

A list of all hospitals in SA was obtained from the Provincial DoH and cross-referenced with electronic databases of hospitals in SA (Medpages and hospital websites). These were cross-referenced with the NDoH hospital list from the office of the minister of health.

The Health Systems Trust provided estimates of the total number of hospitals and hospital beds for each province for comparison among the provinces. The public hospitals were grouped according to the nine provinces in SA and were subdivided into major district municipalities.

All hospitals were contacted telephonically and by email. Either the chief executive officer, superintendent or matron (in the case of district-level facility) in each hospital was contacted to obtain the relevant data. Data were collected from 1 October to 31 December 2014. Private hospital data were readily available from the Hospital Association of SA (HASA) and included extensive data on the number of hospitals, total number of hospital beds and type of beds. Private hospitals were contacted telephonically to verify these data.

HerkulaasCombrink commented 4 years ago

Brilliant, brilliant study - and this is the data we need. It is a shame that this is 2017, but, it does have the data we require. Thank you for your insight @anelda. I do not personally know the authors, but I do know the department. Would you mind making contact?

HerkulaasCombrink commented 4 years ago

@elolelo , what is your idea of the websites? I am trying to find the geo-locations of the testing centres but I am picking up something that exponentially might complicate things, that labs/pathologists might be referring samples. This means that we need to track down core testing facilities. I can ask for this.

anelda commented 4 years ago

Thank you so much for your inputs, @anelda .

1. I propose that we volunteer on this issue so that there isn't overlap. Alternatively, we can create a google doc and people can volunteer from there? - which would you think would work best?

2. Yes, they can. From the start, we needed details and information, and as time continued, the datasets expanded. We have a hospital dictionary, and I can imagine that we do not have all the IDs of all hospitals on this list. If I had to be pragmatic about it, I would propose that we update the library file, and then use that as a reference to see what we do not have.

3. Yes, there is. I have made contact with a few private hospital groups, and have reached out to provincial managers, but unfortunately, I have had little success. It is an excellent suggestion. Would you mind making contact?
  1. Let's start a Google Doc - great suggestion. This thread may become quite long and people might miss stuff if they have to read through everything. I can do it and share unless you have a covid19 Google Folder already where you want to keep things together?

  2. Which one is the library file? I can do a compare and merge on the files unless either of you have a script ready to do that? I'll probably do it in R and can share the merged file in the next hour or so

  3. I can reach out to the website owners. Fingers crossed that the email is still functional and that they're checking it.

anelda commented 4 years ago

Brilliant, brilliant study - and this is the data we need. It is a shame that this is 2017, but, it does have the data we require. Thank you for your insight @anelda. I do not personally know the authors, but I do know the department. Would you mind making contact?

I'll email them.

HerkulaasCombrink commented 4 years ago

@elolelo @anelda the link to the doc is below.

https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

Choose an item, then update accordingly.

There are five hospital files:

The idea is to gather, create the complete files, then merge at the end.

I used Python for the merging, but any basic inner join will do - since the current ID's are already linked to the files.

anelda commented 4 years ago

For hospital beds, there is this study:

Geographical maldistribution of surgical resources in South Africa: A review of the number of hospitals, hospital beds and surgical beds A J DellI; D KahnII (IBSc, MB ChB, PhD; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa; IIMB ChB, FCS (SA), ChM; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa) http://dx.doi.org/10.7196/samj.2017.v107i12.12539 published 2017 with a contact email for the lead author angelajdell@gmail.com

Great news! Angela responded within 25 minutes to my email. She shared her thesis in PDF (also available from http://hdl.handle.net/11427/22796) and is busy looking through her spreadsheets to find the most recent one. She'll share that as soon as she's found it.

We have to make sure people who share their hard collected open datasets receive due credit!

HerkulaasCombrink commented 4 years ago

@anelda I echo your request and acknowledge your statement. Thank you.

anelda commented 4 years ago

@anelda I echo your request and acknowledge your statement. Thank you.

I'll create an issue about this. It's important for data provenance as well

anelda commented 4 years ago

@anelda I echo your request and acknowledge your statement. Thank you.

I'll create an issue about this. It's important for data provenance as well

See #117

anelda commented 4 years ago

@elolelo @anelda the link to the doc is below.

https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

@MikeMcMalace good morning! I can't access this file? Can you help please?

HerkulaasCombrink commented 4 years ago

Please try link @anelda @elolelo

HerkulaasCombrink commented 4 years ago

@anelda @elolelo Good morning!

anelda commented 4 years ago

Please try link

Thanks @MikeMcMalace . It's view only mode though?

elolelo commented 4 years ago

@anelda @MikeMcMalace Good morning, @anelda , in this #117 issue, do you suggest that @MikeMcMalace should create another sheet to add the details about sources of data ?

anelda commented 4 years ago

in this #117 issue, do you suggest that @MikeMcMalace should create another sheet to add the details about sources of data ?

Good morning @elolelo. Hmmm... I wonder if it may be worth our while to have a quick online meeting to chat about the data and where we want to go with it? I received hospital bed data from Angela this morning and am busy cleaning it up. What do you think @MikeMcMalace

elolelo commented 4 years ago

@anelda - I think the meeting may be worth our while. I should be available from 11 am and onwards today . Wow!! sounds like you've recieved valueable data - I just saw now that 87 000 beds in the public sector are available for Covid19 patients - I wondered where (in which hospitals) are those beds - so hopefully your data could answer this question.

anelda commented 4 years ago

If you send me your email addresses and times when you're available, I can set up a meeting in Zoom or Hangouts. Don't want to share meeting link here as there's been problems with trolls crashing open online meetings. anelda@talarify.co.za. Thanks!

HerkulaasCombrink commented 4 years ago

Love the idea of a meeting! Yes. Currently, I see a gap at 12:00? Would that suffice?

Can we invite @vukosim to this meeting, please?

anelda commented 4 years ago

Okay, I created a Doodle poll for today to see who can attend and when. Anyone who'd like to join can complete the poll. I'll need people to send me an email address where I can share the meeting link. Thanks! https://doodle.com/poll/498wfb79wwfiuuev. I set the meeting for 1 hour, but we can keep it shorter if needs be. Will be fun to put faces to names :-)

HerkulaasCombrink commented 4 years ago

@elolelo @anelda the link to the doc is below. https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

@MikeMcMalace good morning! I can't access this file? Can you help please?

Is it working?

Thank you for the meeting, and insight. :)

elolelo commented 4 years ago

@MikeMcMalace Thanks, it's now working. The e-meet was indeed insightful!

anelda commented 4 years ago

Hi everyone, thanks so much for the meeting yesterday. I'm happy to report that Angela added her hospital bed info to Figshare so there is a proper citation for it now and it's also officially licensed as CC-BY - https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711. I'll share the cleaned data here later today.

anelda commented 4 years ago

Good morning everyone. The clean data from Angela Dell's thesis (hospital beds and number of surgeons for public and private hosps - last updated March 2016 for 543 hospitals) is now available at https://figshare.com/articles/South_African_Hospital_Beds/12073596. There is also a readme to describe how I went from the raw data to the resulting CSV. I'm trying to do things in a way that we can track errors if we find them and also to make it reproducible. Hope this is useful to your efforts :-)

elolelo commented 4 years ago

Good morning everyone. The clean data from Angela Dell's thesis (hospital beds and number of surgeons for public and private hosps - last updated March 2016 for 543 hospitals) is now available at https://figshare.com/articles/South_African_Hospital_Beds/12073596. There is also a readme to describe how I went from the raw data to the resulting CSV. I'm trying to do things in a way that we can track errors if we find them and also to make it reproducible. Hope this is useful to your efforts :-)

Hi @anelda Thanks a lot! This data might be relatively old but there is no doubt that it's useful and so is the readme file. Will update you on the viz when it's ready.Thanks once again

anelda commented 4 years ago

Hi everyone, I've been thinking a lot about the question of reproducibility in terms of putting the open hospital dataset together. Here is an attempt in R to look at various datasets that are available and tidy them up in order to be able to compare and/or combine programmatically - https://htmlpreview.github.io/?https://github.com/anelda/za_open_hospital_data/blob/master/reports/za_hospital_analysis_v2.html

The document shows only really the first phase of pulling the data in from the various sources in order to compare and combine them. I'm working on the next step where we can use fuzzy logic to match facility names to merge across datasets to harvest the maximum number of attributes that are available.

The R project is available at https://github.com/anelda/za_open_hospital_data with all the "cleaner" CSV files as well.

Let me know if you think it may be useful for this project or if you have ideas for improving it? It's not super pretty at the moment, but I'll work on formatting in the next iteration.

HerkulaasCombrink commented 4 years ago

@elolelo , please see @anelda comment above.

Firstly, @anelda , just by virtue of proactively scripting cleanup for standardisation and reproducibility, is great. I do understand statistically why you would propose and started coding fuzzy logic, and although I primarily code in python now, I love the Tidyverse packages in R and this is an essential toolkit to use.

Thank you for putting notes in your code, providing the detailed description and it is super easy to follow and replicate. I understand how this feeds into the https://afrimapr.github.io/afrimapr.website/ project.

Any standardisation to a common practice that is reproducible, will always be valuable. My only concern is I would rather provide no information in certain fields, than using fuzzy logic with hospital data in visualisations. The fuzzy logic approach will be useful in modelling for sure, but I would have to test its performance against a NN to see which one performed better in terms of pattern recognition.

Thank you again @anelda .

vukosim commented 4 years ago

Thanks all. Can we move to have a version 1 of the combined data this week?

elolelo commented 4 years ago

@vukosim I have started with combining data that @MikeMcMalace and I worked on before @anelda 's contribution with the data from @anelda and at the same time looking at the pdf doc you shared.I will still need to look at other recent contributions made. I was wondering if it's worth creating a wiki describing and referencing the data collected and cleaning process as issue #117 suggests , amongst mentioning other necessary details . That way we can have a complete documention of some or most aspects of this project . What do you think about having the wiki?

@anelda The page looks neat, easy to read and it's useful. Thank you!

elolelo commented 4 years ago

@vukosim I have started with combining data that @MikeMcMalace and I worked on before @anelda 's contribution with the data from @anelda and at the same time looking at the pdf doc you shared.I will still need to look at other recent contributions made. I was wondering if it's worth creating a wiki describing and referencing the data collected and cleaning process as issue #117 suggests , amongst mentioning other necessary details . That way we can have a complete documention of some or most aspects of this project . What do you think about having the wiki?

Maybe the wiki's might not be an option after all. I will just update the relevant readme

anelda commented 4 years ago

Good morning everyone, maybe there couould be a master readme file in the data repo with additional readme files in the same repo - one readme for each of the datasets.

Maybe also an idea to create subfolders fro hospital data, case data, and other data that may be collected here? That may break scripts from people who are using the csv case data via direct link to the raw data though.

Also see @webdevan comment in #117

anelda commented 4 years ago

@elolelo @MikeMcMalace have you seen this https://www.who.int/healthinfo/MFL_Resource_Package_Jan2018.pdf?ua=1. Seems like it may be very valuable to scan through it to get ideas about making the dataset compatible with WHO standards/expectations from the outset?

anelda commented 4 years ago

@elolelo @MikeMcMalace I remember someone mentioned MedPages in our meeting. Did anyone contact them to ask if we could access their data to supplement the data we have?

elolelo commented 4 years ago

@anelda I have not contacted them recently for this Covid-19 work but I did make contact earlier this year (for another project) and they gave me a quote of how much I'd have to pay to get access to the type of data that I needed (the data was simply the geo coordinates of private hospitals)

anelda commented 4 years ago

@anelda I have not contacted them recently for this Covid-19 work but I did make contact earlier this year (for another project) and they gave me a quote of how much I'd have to pay to get access to the type of data that I needed (the data was simply the geo coordinates of private hospitals)

Thanks @elolelo! Do you know if one is allowed to share the data if one do indeed pay for the access? I would expect that they will have restrictions on how the data can be used?

elolelo commented 4 years ago

@anelda I don't know for sure, their pricing was soo unattractrive to me such that I didn't even bother asking any further questions. It's likely that they have restrictions, but one would need to ask to be sure.

Yeshara commented 4 years ago

Hello, is help still needed as per the Google doc? Do you still need data on the location of the testing centres?

elolelo commented 4 years ago

Hello, is help still needed as per the Google doc? Do you still need data on the location of the testing centres?

Hello, yes - help is still needed ( even on the testing centres)

Yeshara commented 4 years ago

Ok great! I have managed to get data on the testing centers, will convert it to a readable csv and upload it.

IneffableKoD commented 4 years ago

@anelda we are currently working on a map visualization that is a bit similar to the one shown in your last link. For now ,most helped needed is on the data - populating the columns with

  • Number of beds per identified hospital
  • Number of staff members per hospital
  • Geolocation of Covid19 testing centers
  • Webpages of hospitals
  • And just about any other incomplete info on the hospital data

The data file is the one that @MikeMcMalace has identified when he opened this issue.

Dear @elolelo, I already created a map for testing facilities. I would love to help map the data you have. Which data should I use from this repo: https://github.com/dsfsi/covid19za/tree/master/data

Are the hospitals (https://github.com/dsfsi/covid19za/blob/master/data/health_system_za_public_hospitals.csv) testing facilities?

Here is a link to my map: https://www.ineff.ch/cov19testmap/# (we can adjust it for SA if you want to make it SA-specific)

Link to my repo: https://github.com/IneffableKoD/cov19testmap

The map is created entirely on OSS and open data. We created specifically to be easy to use and manipulated by people that are not specialists in mapping/GIS.

Feel free to get in touch if you have any questions or suggestions!

Kind regards, stay healthy Ineff

IneffableKoD commented 4 years ago

Ok great! I have managed to get data on the testing centers, will convert it to a readable csv and upload it.

I had a hard time finding data. There is a YT video with facilities in Joburg. I also started crowdsourcing on Reddit: https://www.reddit.com/r/southafrica/comments/fycj19/looking_for_help_to_list_covid19_testing/

The crosspost to r/johannesburg got some attention.

elolelo commented 4 years ago

Ok great! I have managed to get data on the testing centers, will convert it to a readable csv and upload it.

I had a hard time finding data. There is a YT video with facilities in Joburg. I also started crowdsourcing on Reddit: https://www.reddit.com/r/southafrica/comments/fycj19/looking_for_help_to_list_covid19_testing/

The crosspost to r/johannesburg got some attention.

Dear @IneffableKoD

Thanks for offering to help.

You could use this file : https://github.com/dsfsi/covid19za/blob/master/data/health_system_za_hospitals_v1.csv but in it - there isn't enough data about the testing facilities (less than 20 identified hospitals). Most of the data has not yet been recieved from some people who offered to help with bringing it in.

To answer some of your questions: -Not all the hospitals you identified are testing facilities. In the file that I have referred you to, there is a column where hospitals are classified, check for testing facilities there.

-About making the map SA specific - I think it could be good as almost all of the data in this repo is SA specific but your map has a potential use for Covid-19Africa's repo, see #32

IneffableKoD commented 4 years ago

Dear @IneffableKoD

Thanks for offering to help.

You could use this file : https://github.com/dsfsi/covid19za/blob/master/data/health_system_za_hospitals_v1.csv but in it - there isn't enough data about the testing facilities (less than 20 identified hospitals). Most of the data has not yet been recieved from some people who offered to help with bringing it in.

To answer some of your questions: -Not all the hospitals you identified are testing facilities. In the file that I have referred you to, there is a column where hospitals are classified, check for testing facilities there.

-About making the map SA specific - I think it could be good as almost all of the data in this repo is SA specific but your map has a potential use for Covid-19Africa's repo, see #32

Thank you for getting back to me! I'll check the data later and map what I can. It would be optimal to keep a file that can be updated on your end.

For an SA-specific map, we can simply assign a domain with pre-set parameters.

I will make a proposal and let you know. Always open to proposals for improvements. Feel free to raise an issue in my repo in case.

IneffableKoD commented 4 years ago

To answer some of your questions: -Not all the hospitals you identified are testing facilities. In the file that I have referred you to, there is a column where hospitals are classified, check for testing facilities there.

Do you mean those that have the "Hospital responds to COVID-19" comment in the column "category"?

elolelo commented 4 years ago

To answer some of your questions: -Not all the hospitals you identified are testing facilities. In the file that I have referred you to, there is a column where hospitals are classified, check for testing facilities there.

Do you mean those that have the "Hospital responds to COVID-19" comment in the column "category"?

Yes, also check out my pull request.

vukosim commented 4 years ago

Hey @elolelo @MikeMcMalace @anelda See https://coronavis.dbvis.de/en/

This is so good.