djay / covidthailand

Thailand Covid testing and case data gathered and combined from various sources for others to download or view
126 stars 15 forks source link

2nd Doses Vaccination and 3rd Doses Vaccination #85

Closed RuijingZ closed 2 years ago

RuijingZ commented 2 years ago

I found that there is a big increase for 2nd dose and 3rd dose on Sep 25. From the pdf link, it seems that this is a correction of 3rd dose vaccination from 62K to 1 milllion. May I ask for confirmation that it is a correction of vaccination number and will keep updating with this number? Also, there is a mismatch when I try to add 09/27 new vaccination data to 09/26 cumulative vaccination data. The number should add up to 09/27 cumulative vaccination, but there are large discrepancies, like for 1st dose, newly added dose is reported as 62276, but the actual increase in cumulative 1st dose is 195515. image

reduxionist commented 2 years ago

Hi! Thanks for checking in on our data! :smile:

In one of the data sources we use, the MOPH did not publish any data on the 25th (for the 24th), instead they either left blank or repeated the previous day's data: "หมายเหตุ : ข้อมูลการให้บริการวัคซีนวันที่ 24 ก.ย. 64 อยู่ระหว่างตรวจสอบข้อมูล เนื่องจากมีผู้เข้ามารับวัคซีนมากกว่า 1 ล้านโดส". Then on the 26th they changed the format of their PDFs to include a 3rd booster/4th shot, both of which caused problems we had to fix (there's no API for this stuff, @djay had to create, and now maintains, a ton of custom scraping code to bring all this together). So there has been a lot of churn in the data we can present over the last week.

Enough background though; so yes, I can confirm that there have been corrections to the vaccination numbers. However, while @djay fixed a bunch of issues alluded to in my above paragraph, he is also currently aware of and working on further issues with data from the dashboard not matching data on twitter. So I can't yet confirm what the exact figure that will be updated from is (the MOPH dashboard contains a time-series that means historical information may be updated at a later date, and we may use that to reflect corrections they've made after the fact).

I hope this addresses some of your concerns, and that the data sources will stabilize again soon. @djay said last week was one of the most unstable he's encountered...

reduxionist commented 2 years ago

Actually @djay took the time to check and says that the number is not correct; thanks for pointing it out!

RuijingZ commented 2 years ago

Actually @djay took the time to check and says that the number is not correct; thanks for pointing it out! Thank you so much for looking into the issue and gathering these data together! Hope these vaccination data will come back soon if there is a more stable source found.

reduxionist commented 2 years ago

Since there was no data on the 25th, we mis-scraped the explanation of why ("over 1 million total doses issued") to the number of third doses issued. This I believe we fixed last week. For the addition of Sept. 27 new vacc data to Sept. 26 vacc data I am about to investigate the CSVs to see if problem has persisted since we fixed the data from the 25th. In any case, my investigation should give us answers (and hopefully fixes thereafter!) to both issues....

As for a more stable source, I doubt there will be one. The government's trend has been to release less data rather than more (most recently they stopped reporting persons under investigation (IIRC), and there's no one else providing numbers that I'm aware of (though djay is the expert on this). So we'll do our best, and with help from people like you we'll keep it going... 😉 😄

reduxionist commented 2 years ago

Confirmed that the discrepancy when adding 09/27 new vaccination data to 09/26 cumulative vaccination data comes from the source PDFs presented by the Ministry of Public Health. We are not mis-scraping the data in this case, they are just presenting inconsistent data. I'm not sure yet how @djay would approach this situation. I'm going to look ahead to the 28th's data points, and see if they suggest which numbers from 9/27 and 9/26 are correct...

reduxionist commented 2 years ago

The pattern continues on the 28th; the new doses administered (I'm limiting my investigation to first doses for simplicity's sake) do not add to the previous day's total to give the current day's total. I even tried going back two days, and adding both new dose figures to the two-day old total but just came up with another new total figure. Noticing a trend in the numbers, however, I have a theory to propose:

It seems the totals have begun to exceed what would be expected from previous day's total plus today's additional doses. Perhaps, after their inability to report does in time on the 25th they changed their reporting rules. Whatever the various data points from hospitals, etc., report by a given cut-off time is reported as the total figure for that day. Whatever trickles in after that time is accounted for in the next day's base cumulative dose figure and so when that day's new doses are added the cumulative total exceeds our expectations. I am not a data analysis expert, so I don't know if I'm explaining this well, and I don't know how long this pattern continues on for.

What I do know is that the error comes from the source, and if my theory is correct then I would recommend just taking each day's new reported figures as the authoritative ones that include corrections for data that was not available the previous day. I hope that is convenient for your analysis or research purposes. If you think there are any code changes I could make to account for this instability, please do feel free to suggest them and I will do my best to implement them but nothing occurs to me off the face of it.

Also, if you have any better theories, I'd love to hear them!

At least this suggests the vaccine roll-lout is really stepping up pace if they can't keep up with the data... 😄

reduxionist commented 2 years ago

@djay any thoughts?

@RuijingZ I will wait for feedback from you both before taking further action on this issue...

djay commented 2 years ago

if I understand you correctly we should be throwing away any daily totals and only using cumalitive figures and going backwards to recalculate the daily figures from that. Which I think might be mostly what is happening anyway, or perhaps a combination. certainly there is code there to calculate the dailies

reduxionist commented 2 years ago

Ah, that sounds right. I will explain to @ruijingz to do the same in his spreadsheet.

On Mon, Oct 11, 2021 at 12:31 PM Dylan Jay @.***> wrote:

if I understand you correctly we should be throwing away any daily totals and only using cumalitive figures and going backwards to recalculate the daily figures from that. Which I think might be mostly what is happening anyway, or perhaps a combination. certainly there is code there to calculate the dailies

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/djay/covidthailand/issues/85#issuecomment-939695006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD2IJ2DBIOMXJCIHUITWELUGJZEPANCNFSM5E3U5EAA .

reduxionist commented 2 years ago

@RuijingZ As you can see from the above discussion with djay, since the government's daily new vaccine doses are reported on a best effort basis, they cannot be relied upon to add up to the next day's total cumulative doses figure. Your best approach is probably just to treat them as an estimate, but not to discard them in favor of using the next day's total doses figure. While it's not ideal, given the situation we're all working under, I hope this answers your issue sufficiently for the mean time and thus am closing this ticket as "invalid" because the "won't fix" label sounds worse. ;)

RuijingZ commented 2 years ago

@RuijingZ As you can see from the above discussion with djay, since the government's daily new vaccine doses are reported on a best effort basis, they cannot be relied upon to add up to the next day's total cumulative doses figure. Your best approach is probably just to treat them as an estimate, but not to discard them in favor of using the next day's total doses figure. While it's not ideal, given the situation we're all working under, I hope this answers your issue sufficiently for the mean time and thus am closing this ticket as "invalid" because the "won't fix" label sounds worse. ;)

@reduxionist @djay Thank you for digging into these problems! Sorry for the late response. I was out last week. I agree that keeping cumulative vaccinations data and calculating daily data from these cumulative values is a better way for this dataset. I totally understand that some official data are not added up, not just MOPH. Thank you again for your patience and kindness!

reduxionist commented 2 years ago

Glad you agree and you're welcome; thank you likewise for your patience, understanding and attention to detail! :wai: 八