devdatalab / covid

43 stars 19 forks source link

Explore discrepancy between PC and DLHS #8

Open paulnov opened 4 years ago

paulnov commented 4 years ago

Population Census reports 2.5x the number of hospital beds to DLHS.

Can we explain this discrepancy with a more careful study of the microdata? Both of these datasets are built up from many subfields (e.g. types of clinics and hospitals). I'm wondering if: (i) some subset of the data are more comparable than others (e.g. maybe they are consistent on urban hospitals? or on health subcenters?) (ii) some states have more of a discrepancy than others.

More broadly, any differences in reporting across states are important to note— e.g. if one state counts beds in small rural clinics (with no capacity to deliver high level care) and another doesn't, it's important to note this difference.

To resolve this issue, dig in as much detail as possible into inconsistent reporting within and across these datasets and report what you find.

The complete hospital estimates build is found in covid/make_covid.do

DLHS specific build is b/generate_dlhs4_district.do and PC is in prep_hosp_pca_vd.do and prep_pc_hosp.do.

DLHS documentation is in $health/DLHS4_Documents/ ($health=~/iec/health)

Note that DLHS starts at the district-level, while PC starts at the village/town level. We collapse PC to make them comparable, but it may be worth examining the PC microdata to see if there was a different way of collapsing that would make these more consistent.

This Google Doc has more information on the underlying datasets.

@seasher04, do you have a separate doc describing the DLHS sampling/census strategy, right?

seasher04 commented 4 years ago

Yes it's in $health/DLHS4_Documents/