dtandev / coronavirus

2020 Poland coronavirus data (COVID-19 / 2019-nCoV)
MIT License
19 stars 7 forks source link

Confirmed cases count off by 500 cases as of 4th of May #11

Open kermidt opened 4 years ago

kermidt commented 4 years ago

Hi,

The cumulative count of confirmed cases for 4th of May is off by ~400 when comparing against: https://github.com/CSSEGISandData/COVID-19 or https://docs.google.com/spreadsheets/d/1ierEhD6gcq51HAm433knjnVwey4ZE5DCnu1bW7PRG3E/htmlview?usp=sharing

From https://raw.githubusercontent.com/dtandev/coronavirus/master/data/CoronavirusPL%20-%20Timeseries.csv I get:

05/01    13196.0
05/02    13473.0
05/03    13572.0
05/04    13585.0

From https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv I get:

5/1/20     13105
5/2/20     13375
5/3/20     13693
5/4/20     14006

script:

import pandas as pd
import io
import requests
import numpy as np

url='https://raw.githubusercontent.com/dtandev/coronavirus/master/data/CoronavirusPL%20-%20Timeseries.csv'

def plStrToDate(x):
  day, month, year = map(int, x.split("-"))
  return '%02d/%02d' % (month, day)

s = requests.get(url).content
d = pd.read_csv(io.StringIO(s.decode('utf-8')))
d = d.drop('Age', axis=1)
d['Timestamp'] = d['Timestamp'].map(plStrToDate)
d['Confirmed'] = d['Infection/Death/Recovery'] == 'I'
d['Recovered'] = d['Infection/Death/Recovery'] == 'R'
d['Deaths'] = d['Infection/Death/Recovery'] == 'D'
d = d.groupby(['Timestamp']).sum()
d = d.cumsum()
d