evidenceaction / DataDashboard-Dispensers

0 stars 0 forks source link

Database questions #1

Open olafveerman opened 8 years ago

olafveerman commented 8 years ago

During our data deep dive, a couple of questions surfaced around the database. Our guess is that Leah may be the right person to answer these.

cc @katrinskaya @mphillipsea

lndiku commented 8 years ago

1. More than half of the dispensers in the dispenser_database table doesn't have a lat/lon. Is this correct, or should we be looking elsewhere for that?

This is correct. The phones used to collect the gps information were faulty and the location would not save. The data team are working to capture all this missing data and update the database gradually.

2. What does the district column in dispenser_database point to? Not always a district

It always points to a district just that sometimes the district has the same name as the program especially in Uganda and Malawi data.

3. What table can we use to measure the outage rate? What does an outage rate of 20% mean?

I am not sure that I know what you mean by outage rate. If you are referring to the rate at which we find dispensers empty and without chlorine then you should use the dsw_per_dispensed_rates table.

4. The issues table seems to only contain a small sub-set (start in July 2015) and most of them have no date_completed. Is there a table with better completion dates, and that goes back further?

The online issue tracker was only launched around that time and so past data still exists on paper and not in the system. The issues table is used with the sms_issues table to determine the issue completion date. This is only being used formally starting 4 January 2016, since the sms API we use to collect this information was still in testing fazes up to late November - hence the large gaps in data.

5. Can you tell us more about dsw_per_chlorine table?

The dsw_per_chlorine table contains data captured during the delivery of chlorine to waterpoints. We then use scripted code to determine the total litres of chlorine delivered (sum of jerrican_delivered *5), number of deliveries made (sum num_of_deliveries) and the average 30 day chlorine consumption per program (average of avrg_30day_usage_litres).

6. Where do we get the adoption rate data?

This can be retrieved from the dsw_per_adoption table. However, what is listed here are variables again a script must be ran to determine the TCR and FCR adoption of each program and the weighted monthly average for the month.

7. Is there data that reflects chlorine usage. Refill rates?

Yes, obtained from the dsw_per_chlorine tables as explained above.

8. How often is the data updated?

The evaluation data in dsw_per_dispensed_rates and dsw_per_adoption is updated on the 10th of every month once a new set of monthly data is cleaned and made available. The data from dispenser_database is updated on the fly and the last updated date recorded. All the rest of the data is updated on the 18th date of each month if updates are available.

9. Can we hook in straight to the SQL db?

Yes you can using the account I created for you. It would be good to keep in touch when you do this as we are likely to adjust some of our table structures such as the dispenser_database table that we shall be expanding to collect a larger quantity of information.

Please let me know if I need to expound on anything a bit more.

olafveerman commented 8 years ago

Thanks for the clarifications @lndiku, this is very useful for us. We have a couple of follow up questions:

2. Districts
The Malawi and Uganda dispensers always point to a district. In the case of Kenya we were not able to identify the administrative level you are using. We initially thought you would be referring to [Kenyan counties](), but in the dispenser_database you seem to use to a smaller administrative level. 'Bungoma North' and 'Bungoma South' are, for example, not counties in themselves.

3. Outage rates

5. DSW per chlorine
This means that each row in this table is always updating with each delivery? Jerrican_delivered refers to the total amount of jerricans delivered during the lifetime of the dispenser?
Also, there are 14251 records in this table, while the dispenser_database contains over 25 000 dispensers.

6. Adoption rates
Could you provide more information about how you come to the adoption rate based on the TCR and FCR?

bchen10 commented 8 years ago

Hi @lndiku! Just wanted to followup to see if you'd be able to answer our additional questions above. Thanks!

lndiku commented 8 years ago

Hi @olafveerman

2. Districts We point to districts in all data. The nomenclature is different in Kenya and a subcounty is at the same level as district. For Kenya data subcounty=district.

3. Dispenser Functionality and Hardware rates/ outage rates The dsw_per_dispensed_rates table can be used to calculate the functionality of dispensers - Functionality is defined as a dispenser that will release a proper dose of chlorine (3ml) if the dispenser has chlorine in it. Non-functional dispensers may experience valve problems, a missing or cracked tank, or a missing or vandalized dispensers. Dispensers that do not have chlorine but that do release chlorine after chlorine is added are considered “Functional”. i.e s206_cl_dispensed = 1.

It can also be used to calculate the hardware problem rates - Hardware problems include casing, tank, lid, and valve issues. i.e Casing problems s209_dispprob_cas_padlk =1; Tank problems s210_dispprob_tnk_vlv=1; Nut or bolt problems s212_nut=1. Please note that sticker and tag problems: s211_dispprob_stker_tag_paint; are recorded but do not contribute to calculations as they are not considered hardware problems.

_5.DSW per Chlorine _

Yes; to the total amount of jerricans delivered during the lifetime of the dispenser and is updated monthly.

The records for delivery only exist from January 2014 to December 2015, furthermore depending on the usage of chlorine at the waterpoint not every dispenser will receive chlorine and have deliveries logged. Malawi also does not record any of their deliveries at all due to the hub and spoke system they had been using previously. However, the discrepancy is a bit large and I may need to confirm this tomorrow.

6. Adoption rates. TCR Adoption For TCR, the variable of interest is c801_tcr_test. A sample of drinking water from each household is tested for chlorine and readings taken based on the amount of chlorine in the water sample. Basically, the readings range from 0 to XXXX. 0 readings indicate the water sample does not contain chlorine while anything greater than zero indicate the water contains chlorine. To obtain TCR adoption, • The TCR values (from c801_tcr_test) are coded to absolute figures: 0 and 1. 0 values indicate the water sample does not contain chlorine while 1 indicates otherwise. The latter refers to c801_tcr_test readings that are greater than 0, or water samples that contain chlorine residual/positive for chlorine. • For program adoption, obtain the proportion of households for a given program that test positive for chlorine out the total households where water samples were actually given and TCR tests done. Basically this is the mean of the variable created out of c801_tcr_test (it only contains 0’s and 1’s). The proportion is then converted to a percentages by multiplying by 100. • For program wide adoption, adoption for each program is weighted based on the number of dispensers installed. The more the dispensers installed, the higher the weight. The number of dispensers per program per month can be found in dsw_per_waterpoint.

For FCR adoption, • Follow the process above, but use c804_fcr_test instead of c801_tcr_t

olafveerman commented 8 years ago

Thanks again for the clarifications @lndiku. Here are some quick follow-up questions

2. Districts Do you happen to have a list of the sub-counties and their parents? Since there are a lot of dispensers without coordinates, we have to rely on the district to do location based analysis. I wasn't able to find a list with Kenyan sub-counties (constituencies?) that match the ones you are using.

3. Dispenser functionality Each row refers to a particular dispenser? Take s209_dispprob_cas_padlk =1 for month 6 in year 2015. This means that that particular dispenser had a casing problem in June of 2015?

6. Adoption rates In which table can we see c801_tcr_test? Can we relate each measurement back to a dispenser? We'd like to see if we can calculate these rates per district, instead of program.

lndiku commented 8 years ago

Hi @olafveerman

2. Districts I can create this for you from DB export from table admin_territory_details. Please see box folder here.

3. Dispenser Functionality Yes each row belongs to one dispenser, also your assumption of s209_disprob_cas_padlk is correct.

6. Adoption rates Adoption data is in dsw_per_adoption_rates table. This is where you will find all variables that deal with tcr, fcr and diarrhea rates. Yes, the same table has a variable c102_wpt_id that has the waterpoint ID of the dispenser evaluated. If you perform a join with dispenser_database on the waterpoint ID column you should be able to group the data by district afterwards and adjust the adoption calculation explained earlier accordingly. Let me know if you will need any support on this from my side but I believe the DB account I assigned to you has privileges to create views which might come in handy her

olafveerman commented 8 years ago

Hi @lndiku

Couple of questions about the TCR and FCR:

image

olafveerman commented 8 years ago

Hi @lndiku and Rebecca,

This morning we spoke with @katrinskaya and came to the conclusion that we can discard the FCR readings. To calculate the adoption rate for the whole program, we will then use the TCR values and:

  1. calculate the adoption rate per waterpoint
  2. calculate the adoption rate per program, based on the adoption rates of their waterpoints. We'll weigh these using the total number of households served by each waterpoint
  3. calculate the adoption rate for the whole program, based on the program adoption rates, weighed by total households served by each program

Questions:

bchen10 commented 8 years ago

Hey @lndiku and Rebecca, just reminding you both that we have some questions above that need to be answered soon so we can move forward with implementation. Thank you!

cc: @katrinskaya