OCHA-DAP / Data-Team

A place for tracking data team issues
0 stars 1 forks source link

Possible problem with indicator PCH090 #92

Open cjhendrix opened 9 years ago

cjhendrix commented 9 years ago

Remember the problem with PCH090 (vaccines financed by govt)? It looks like the different sub-annual values for a given country are all the same.

When we graphed it on the indicator page, it looked like this: screen shot 2014-10-21 at 23 13 43

So we switched back to being a normal dataset.

I think there are 3 possibilities here:

  1. The data is accurate and simply has many cases of sub-annual rows with the exact same value.
  2. Something strange is happening with the scraper or import
  3. Something is wrong with the source

We will soon make this dataset an indicator again. Alex has written a new api call that handles sub-annual values better. But I thought you guys might want to look at this data a bit to confirm that we don't have a problem in some other part of our system.

takavarasha commented 9 years ago

I have investigated and have found that the problem lies with the import configuration for the indicator (and for PVL010 as well). Specifically, the Expected Time Format configuration for the indicator is set as yyyy-MM-dd, meaning that the value of the indicator is expected to change daily. This is in contrast with the more common Expected Time Format of YYYY used by other indicators whose values are expected to change annually. As a result, there are multiple rows for the indicator per location, each row with a different date in 2014.

Assuming that the data is indeed annual data (which seems very likely given the values never change) the solution is to change the expected date format to YYYY in the import configuration, delete the data in the database for the indicator, and then run an import.

cjhendrix commented 9 years ago

Thanks for researching it Godfrey. I've made an issue for sprint 41 (next week) to empty those data series and coordinate it with you so you can reconfigure and re-run the import.

https://github.com/OCHA-DAP/DAP-System/issues/312

Even with the incorrect configuration, I wonder why different dates are assigned? I guess the date in the raw baseline data is changes with each scrape and cps is creating new records everytime we import it?

cjhendrix commented 9 years ago

@takavarasha We're going to try to do what I described above today or tomorrow when you have time. However, PVL010 looks ok to us, it seems to have only 1 value per country for 2014.