Closed nathakits closed 3 years ago
Hello @nathakits! Thanks for catching this problem with our data. I will begin preliminary investigations now, but a full analysis may have to wait until I have the free time this weekend. My apologies for the delay, but as a volunteer project our assistance is available on best-effort basis only. :)
@nathakits you can start by creating tests in a new branch which fail showing the issue. and also modify an existing test data that is wrong. The existing tests are in https://github.com/djay/covidthailand/tree/main/tests/vaccination_tables. you don't have to change it to be 100% correct, just to show that some values are wrong/missing and for the key dates where it started being missing. Even an empty json file on the right date helps as once its fixed this can be autogenerated to show its correct.
@djay When I create a test file that is just covidthailand/tests/vaccination_tables/2021-07-02.json pytest discovery fails:
Can't match test file {dir_path}/{test} to any downloadable file
I will try to identify the name of the pdf by hand but thought I should bring this to your attention nonetheless as it's not the first time I've encountered this issue since around the time of the refactorings to make file downloads more resilient.
Actually now back to test discovery error. I'll update this issue again once I've got something working that's useful... :wai: 八
@nathakits
While some of the vaccination data sources do break it down by more than Sinovac and AstraZeneca, the government data in inputs/vaccinations/*.pdf do not. We may use the most reliable (i.e. found to consistently provide data) sources rather than the most detailed ones when those more detailed ones are less reliable.
As you can see from the MOPH file inputs/vaccinations/1625137453551.pdf, Sinovac and AstraZeneca numbers were broken down for 2021-07-01, but in inputs/vaccinations/1625377924615.pdf you can see they are no longer doing so as of 2021-07-02. This change in formatting may be responsible for the missing values after that point, I wouldn't have time to investigate further until the weekend.
idk what's going on with this issue, so removing myself from it.
@nathakits: If you can tell us exactly where there used to be data that now there isn't then I'll re-investigate, but do read my last explanations two comments above this one...
Hi @reduxionist. Sorry for the late reply. I will take a look at this issue, this weekend.
No problem @nathakits , just let me know if there's any further detail I can provide...
@reduxionist if I understand this correctly recent allocations are wrong (which should be fixed by https://github.com/djay/covidthailand/pull/143) but there aren't any specific examples in the past where allocations are wrong just missing from reports?
fixed in #143. If there are specific dates that the you checked the report and it doesn't match the data in the CSV then please reopen this with those dates
Hi. In the
vac_timeline.csv
file, there are now only 2 vaccines for the "Vac Allocated" data.Before there was also
Pfizer
andSinopharm
.Also the "Vac Allocated" data now has lots of empty/missing values, starting from
date":"2021-07-02
.