create data pipeline for weekly icu data from a tableau public

wasdee commented 2 years ago

This is a split PR, focusing on extracting weekly icu data from a tableau public.

original pr msg

number of ICU bed could directly translate to number of ventalators, which is essential number to understand the regional situation. Majority of the exceed death might be cause by a hospital cannot accept more emergency cases due to occupancy of bed for COVID-19 patients. We could explain criticality of healthcare capacity use the ratio of hospitalized patients per available ICU. We could also understand the node strength of a province by calculating adjacent sum of hospitalized patient in surrounded province since the patients could easily transfer to. I'm not sure where should I put these files. Feel free to change or let me know the place. The data is 9 years old, but it is a good beginning.

netlify[bot] commented 2 years ago

❌ Deploy Preview for practical-ritchie-cca141 failed.

🔨 Explore the source changes: b02a9d1ae2d4a103ab58a543e7ccf40d6eda45e8

🔍 Inspect the deploy log: https://app.netlify.com/sites/practical-ritchie-cca141/deploys/617edf61383203000855950d

djay commented 2 years ago

Cool. Thabks for the contribution. Any idea how accurate that is after they increased capacity due to covid? Its possible we have a graph showing available beds left but if it's wildly off it wouldn't be so good. I remember there being some slides in some of the briefings with bed numbers. Maybe could double check the briefings?

wasdee commented 2 years ago

I could check the briefings, or slides.

May I ask where are the briefings, you mentioned to?

wasdee commented 2 years ago

The value itself might be inaccurate by large margin but the relative comparison might be ok to use.

For the extra bed, it is ok to assuming a constant multiplication and the ratio might not affect that much when compare relatively between provinces.

For example, The ratio of hospitalized patient per icu bed of province X is 1200/1000=1.2 . The ratio of hospitalized patient per icu bed of province Y is 1600/1000=1.6.

Y is more critical than X province.

Let's say in avg, hospitals could expand 1.6 times the amount of normal situation icu bed.

The actual ratio of hospitalized patient per icu bed of province X is 1200/1600=0.75 . The actual ratio of hospitalized patient per icu bed of province Y is 1600/1600=1 . Y is still more critical than X province.

djay commented 2 years ago

@CircleOnCircles I don't think you can assume they all expanded the same rate and we'd still have to know how much they expanded. All the briefings are in https://github.com/djay/covidthailand/releases/download/1/inputs.tar.gz

wasdee commented 2 years ago

after search thru all briefings, this is what I could find.

page.17 of 090864.pdf

I don't see if this would help in verify, do you have other sources?

wasdee commented 2 years ago

Cool. Thabks for the contribution. Any idea how accurate that is after they increased capacity due to covid? Its possible we have a graph showing available beds left but if it's wildly off it wouldn't be so good. I remember there being some slides in some of the briefings with bed numbers. Maybe could double check the briefings?

ICU Bed itself might be classified

normal bed aka permanent bed
improvised/temporary bed
COVID19-only bed

The scope of this PRs is just normal icu bed, not the others which is added or removed depend on the situation.

djay commented 2 years ago

Maybe it will be clearer if you describe what plot you are proposing? At the moment it seems like it would something like combining severe cases or ventilator cases to get "% occupancy of ICU beds using 2012 data"?

I have also never been able to work out the relationship between severe case and ventilator and the kind of beds described in some of these sources.

Places you might look for more recent data might be https://hdcservice.moph.go.th/hdc/main/index.php or https://public.tableau.com/app/profile/karon5500/viz/moph_covid_v3/Story1 but I don't know if these are up to date.

wasdee commented 2 years ago

Maybe it will be clearer if you describe what plot you are proposing? At the moment it seems like it would something like combining severe cases or ventilator cases to get "% occupancy of ICU beds using 2012 data"?

I have also never been able to work out the relationship between severe case and ventilator and the kind of beds described in some of these sources.

Places you might look for more recent data might be https://hdcservice.moph.go.th/hdc/main/index.php or https://public.tableau.com/app/profile/karon5500/viz/moph_covid_v3/Story1 but I don't know if these are up to date.

Wow, the tableau link with 2020-2021 really maps out the whole aspect of resources of the whole healthcare system.

For the authenticity of data, the Author of the visualization, K. Karonn Yuttanawa, is part of a Strategist Dept, Moph. ref

This is the most recent data I have ever seen. I will extract the data.

djay commented 2 years ago

@CircleOnCircles do you have a way to map the values to there it ventilator or servere covid cases? Getting the data out of that tableau is not too hard but it has no historical data. So either its a plot with almost no data, or we can assume the capacity hasn't changed recently and we can swap in the servere cases per province for number of people in one of those bed types?

djay commented 2 years ago

Actually we don't have data on venterlator hospitalisations per province. Just severe cases. and it doesn't seem to be correct for CM.

Screen Shot 2021-11-02 at 5 15 26 pm

djay commented 2 years ago

@CircleOnCircles or maybe you could email him and ask him if he has all the historical data?

wasdee commented 2 years ago

I emailed him.

wasdee commented 2 years ago

@CircleOnCircles do you have a way to map the values to there it ventilator or servere covid cases? Getting the data out of that tableau is not too hard but it has no historical data. So either its a plot with almost no data, or we can assume the capacity hasn't changed recently and we can swap in the servere cases per province for number of people in one of those bed types?

I think the latter is good enough.

For the mapping, I'm not sure if I understand question correctly, but sure I can map two ventilator or severe covid case. I will join in with province name, I also could create similar geographical plots.

djay commented 2 years ago

@CircleOnCircles I think its not possible to match severe cases per province from the dashboard to the bed types in data but maybe it doesnt matter. We just need get beds occupied and total beds of each type for each province and then show this on a plot. My preference is to show trends on a timeline. So we can see which provinces are likely to reach capacity and when. Not sure about a map yet but could also be possible?

Maybe you can have a go at copying the existing code I have for accessing DDC dashboard data and using it for this hospital bed dashboard?

The code is in https://github.com/djay/covidthailand/blob/main/covid_data_dash.py#L278 The only difference is that your code would be a bit simpler. My code iterates over both dates and provinces to get all the data. Yours would just need to iterate over provinces since there is no way to access past dates. There is a util funciton called explore_workbook that tells you everything you need to extract the data from it. I can help explain the code to you if it will help you.

wasdee commented 2 years ago

K. Karoon hasn't replied to my email. His tableau public worksheet might be the best option.

Agreed with all your ideas. The coming weekend I'm quite busy. I will try to make some commits for the extraction. The extraction is new to me, but let me try first.

Thank you so much DJ.

djay commented 2 years ago

@CircleOnCircles I think what you need is to access the right storypoint https://github.com/bertrandmartel/tableau-scraping#story-points

djay commented 2 years ago

@CircleOnCircles what happened with the diffs? somehow you changed the whole files? maybe you need to install pre-commit and do pre-commit run to get the code formatting correct?

wasdee commented 2 years ago

@CircleOnCircles what happened with the diffs? somehow you changed the whole files? maybe you need to install pre-commit and do pre-commit run to get the code formatting correct?

got it, did the pre-commit

pre-commit run --all-files
Trim Trailing Whitespace.................................................Passed
Fix End of Files.........................................................Passed
Check Yaml...............................................................Failed
- hook id: check-yaml
- exit code: 1

while scanning for the next token
found character '\t' that cannot start any token
  in ".github/workflows/main.yml", line 57, column 34

Check for added large files..............................................Passed
Check for case conflicts.................................................Passed
Check that executables have shebangs.....................................Passed
Check JSON...............................................................Passed
Check Toml...............................................................Passed
Detect Private Key.......................................................Passed
fix UTF-8 byte order marker..............................................Passed
Mixed line ending........................................................Passed
autopep8.................................................................Passed
Reorder python imports...................................................Passed
absolufy-imports.........................................................Passed

but, some that not concerns with my source files fails

wasdee commented 2 years ago

wait upstream to fix https://github.com/bertrandmartel/tableau-scraping/issues/48

djay commented 2 years ago

@CircleOnCircles the idea is to move the code into covid_data_dash.py right?

wasdee commented 2 years ago

as you see fit, I still have to spend time comprehending the existing data pipeline code.

wasdee commented 2 years ago

i decided that I could not able to extract bed types per province.

wasdee commented 2 years ago

This is the extracted data from the pipeline. pls, recommend where to put this data.

data is updated roughly on a weekly to monthly basis I don't know for sure but, we can expect new updates thus, we are able to archive in a historical manner, see the changes in hospital bed occupied rate.

bed_data_2021-12-20T042138.474000+0000.csv

djay commented 2 years ago

@CircleOnCircles it should get called here https://github.com/djay/covidthailand/blob/main/covid_data.py#L239

Then make it similar to these functions. using the same import and export at the end - https://github.com/djay/covidthailand/blob/cc970700a9cedd3220a7c9c2da8f5bf5e3e52284/covid_data_dash.py#L235 You need make it so that it skips doing the scraping if todays has already been recorded since it's a bit slow. you don't need to you use the skip_valid function if you don't want.

Getting it for each province should not be hard after this initial one is done. Just have to work out the select or parameter that sets the province and then use the workbook_interate.

djay commented 2 years ago

@CircleOnCircles I merged https://github.com/djay/covidthailand/pull/227 based on your work. So should start updating the data now. Next need to work out what to plot. Ideas?

djay / covidthailand

create data pipeline for weekly icu data from a tableau public #159