dssg-pt / covid19pt-data

😷️🇵🇹 Dados relativos à pandemia COVID-19 em Portugal
GNU General Public License v3.0
445 stars 182 forks source link

Data on people fully vaccinated with J&J vaccine #834

Closed edomt closed 3 years ago

edomt commented 3 years ago

Dear DSSG team,

As you may know, we've been using your data at @owid. Thank you for all your work!

Our understanding is that since late April, the one-dose J&J vaccine has been used in Portugal (source). This means that we can no longer use fields such as doses2 to calculate the number of people fully vaccinated.

Does the Portuguese government publish the data that would make it possible to calculate this again? This would either be:

Thank you! Edouard

davipt commented 3 years ago

Hello Edouard,

Official data, including their daily posts on social media and weekly reports at https://covid19.min-saude.pt/relatorio-de-vacinacao/ (which directly correlates to our vacinas.csv and vacinas_detalhes.csv) are counts of vaccine shots, and not of people.

A person that gets two shots will count twice, one for "doses1", and (about 4 or 12 weeks) later, for "doses2". A person that gets a Janssen shot will count straight to the doses2. Some official documentation calls it correctly "fully vaccinated", e.g. the weekly PDF report. This PDF also clearly refers that "single dose" are counted for "fully vaccinated" only.

On our twitter bot https://twitter.com/PlenoDVacinacao we report people instead. Fully vaccinated people is the "doses2" value. People partially vaccinated is "doses2 minus doses1". The first needs to grow to 100% of the population (hence the progress bar). The later needs to tend to zero when everyone is fully vaccinated.

About the vaccine brands, the weekly report (backed up on our extra/vacinas/relatório/*.{pdf,csv}, or cleaned up at vacinas_detalhe.csv) do contain the vaccination split by region and age groups, but not by vaccine brand.

However the authorities do report to ECDC all that information plus the split by brand, so we can get it from https://opendata.ecdc.europa.eu/covid19/vaccine_tracker/csv/data.csv

I haven't yet added a workflow to grab this information and publish next to the existing vacinas_detalhe.csv, but have a temporary notebook at https://colab.research.google.com/drive/1u8G_HDj9Yh_wcw3jLhvj68QdfI6w2sbS#scrollTo=sMiptizbXUnD which I use to predict the evolution of the "fully vaccinated" by picking up first doses of Comirnaty+Moderna by moving 4 weeks forward, and Az with 12 weeks, plus the Janssen as single dose.

So, simplifying:

  • Or precalculated metrics such as "people with at least 1 dose" and "people with fully vaccination"

vacinas.csv, "doses2 minus doses1" and "doses2" respectively.

I think I'll augment the vacinas.csv and README with this simple math, but by calling it explicitly "people_fully_vaccinated" and "people_partially_vaccinated" it will be more clear for everyone.

Hope this helps. Also thank you for the @owid world-wide information!

Update: new columns added to the csv: https://github.com/dssg-pt/covid19pt-data/pull/835/commits/aba92ed3250dafa6942f22a78376733a24cedee5#diff-06231d9a852ef4309cf10a03a85945d144f1385a2ab55e70062305feb5e2c26fR276

edomt commented 3 years ago

Many thanks @davipt, this is extremely useful and answers my question perfectly! :)

davipt commented 3 years ago

Hello Edouard

Now that unidose vaccines (Janssen) have larger values (about 188K this week) we found out that albeit the weekly report ( https://covid19.min-saude.pt/relatorio-de-vacinacao/ ) does define unidoses as counting for doses2/fully vaccinated, the daily reports, published on the covid dashboard and shared on social media, do count unidoses instead on the "doses1", which is not consistent at all, and means that on the day the report is release, the values do not match with the daily numbers, unless they're adjusted for the number of unidoses (which we can only gather from the published numbers on ECDC)

To simplify your life I've adjusted the new columns on the vacinas.csv so they take into account the numbers for the islands and the unidose. https://github.com/dssg-pt/covid19pt-data/pull/916

This means you can now access the two columns pessoas_vacinadas_completamente and pessoas_vacinadas_parcialmente and ignore the other ones, and let us take care of the missing data.

We're keeping doses (total jabs given, up to 2 per person), doses1 and doses2, aligned with the daily reports, meaning only continent without islands, and calling them explicitly "first dose" (including unidose) and "second dose" respectively, on our bot. This allows the other two values to more correctly represent the real amount of people fully vaccinated (with 2 doses or unidose), and the remaining people partially vaccinated (with the first dose of two) (all "_novas" are calculated from the respective column minus the prior day)

Please feedback if this helps you. If you need I can create a PR for OWID, albeit it would be harder for me to test it. Basically you can drop the second CSV and the merge, and simply use people_fully_vaccinated = pessoas_vacinadas_completamente and people_vaccinated = pessoas_vacinadas_completamente + pessoas_vacinadas_parcialmente. https://github.com/owid/covid-19-data/blob/dd68690245099c72e166785ec95f3094990c26c2/scripts/scripts/vaccinations/src/vax/batch/portugal.py

edomt commented 3 years ago

Hi @davipt

That's wonderful, thanks! I've prepared a PR here: https://github.com/owid/covid-19-data/pull/1581 I'm seeing a few rows where doses < pessoas_vacinadas_parcialmente, for example on 11-01-2021. Do you know why?

davipt commented 3 years ago

Yes, that is expected, as "doses2" and "doses1" needs to remain consistent with the officially reported daily values (excluding islands, and with unidoses incorrectly on "doses1"), even if those values may not be the most correct ones, whilst "pessoasvacinadas*" will take in consideration the weekly report (aka "vacinas_detalhes.csv") which is more correct in all senses, from including the islands, to having the unidose at the right place, to having backtracked updates for past values (always for the best)

For example 11-01-2021 corresponds to a weekly value of doses1=83682 minus the value for the continent doses1=75581, which should yield the sum of "madeira" and "açores", but is larger because it also includes "unknown". This diference is then applied to the daily value for the remaining of the week (pd.ffil() like you do) until new weekly data is available. 83682 - 75581 = 8101. Then 8101 + 75280 = 83381

Makes sense? The end result shall be the same as you have today, for the islands, but also adjusted the unidoses to be at "fully vaccinated" side.

davipt commented 3 years ago

I've just added an explicit "vacinas" which represents "doses" (continent only) adjusted with the additional weekly islands counts.

"doses" and "vacinas" represents individual jabs of the vaccine, independent of whom took them. People taking two jabs will count as two units.

From https://github.com/owid/covid-19-data/pull/1581/files#diff-acf0595fb59b628f75f25fe9687b1e62afa6b04de4bacb6f628feeed038378bbR13

edomt commented 3 years ago

Wonderful, thanks! :)