djay / covidthailand

Thailand Covid testing and case data gathered and combined from various sources for others to download or view
126 stars 15 forks source link

Missing "Vac Allocated" data #115

Closed nathakits closed 2 years ago

nathakits commented 2 years ago

Hi. In the vac_timeline.csv file, there are now only 2 vaccines for the "Vac Allocated" data.

Before there was also Pfizer and Sinopharm.

Also the "Vac Allocated" data now has lots of empty/missing values, starting from date":"2021-07-02.

reduxionist commented 2 years ago

Hello @nathakits! Thanks for catching this problem with our data. I will begin preliminary investigations now, but a full analysis may have to wait until I have the free time this weekend. My apologies for the delay, but as a volunteer project our assistance is available on best-effort basis only. :)

djay commented 2 years ago

@nathakits you can start by creating tests in a new branch which fail showing the issue. and also modify an existing test data that is wrong. The existing tests are in https://github.com/djay/covidthailand/tree/main/tests/vaccination_tables. you don't have to change it to be 100% correct, just to show that some values are wrong/missing and for the key dates where it started being missing. Even an empty json file on the right date helps as once its fixed this can be autogenerated to show its correct.

reduxionist commented 2 years ago

@djay When I create a test file that is just covidthailand/tests/vaccination_tables/2021-07-02.json pytest discovery fails:

Can't match test file {dir_path}/{test} to any downloadable file

I will try to identify the name of the pdf by hand but thought I should bring this to your attention nonetheless as it's not the first time I've encountered this issue since around the time of the refactorings to make file downloads more resilient.

reduxionist commented 2 years ago

Actually now back to test discovery error. I'll update this issue again once I've got something working that's useful... :wai: 八

reduxionist commented 2 years ago

@nathakits

  1. While some of the vaccination data sources do break it down by more than Sinovac and AstraZeneca, the government data in inputs/vaccinations/*.pdf do not. We may use the most reliable (i.e. found to consistently provide data) sources rather than the most detailed ones when those more detailed ones are less reliable.

  2. As you can see from the MOPH file inputs/vaccinations/1625137453551.pdf, Sinovac and AstraZeneca numbers were broken down for 2021-07-01, but in inputs/vaccinations/1625377924615.pdf you can see they are no longer doing so as of 2021-07-02. This change in formatting may be responsible for the missing values after that point, I wouldn't have time to investigate further until the weekend.

reduxionist commented 2 years ago

idk what's going on with this issue, so removing myself from it.

reduxionist commented 2 years ago

@nathakits: If you can tell us exactly where there used to be data that now there isn't then I'll re-investigate, but do read my last explanations two comments above this one...

nathakits commented 2 years ago

Hi @reduxionist. Sorry for the late reply. I will take a look at this issue, this weekend.

reduxionist commented 2 years ago

No problem @nathakits , just let me know if there's any further detail I can provide...

djay commented 2 years ago

@reduxionist if I understand this correctly recent allocations are wrong (which should be fixed by https://github.com/djay/covidthailand/pull/143) but there aren't any specific examples in the past where allocations are wrong just missing from reports?

djay commented 2 years ago

fixed in #143. If there are specific dates that the you checked the report and it doesn't match the data in the CSV then please reopen this with those dates