GoogleCloudPlatform / covid-19-open-data

Datasets of daily time-series data related to COVID-19 for over 20,000 distinct locations around the world.
Apache License 2.0
471 stars 130 forks source link

Macao defined twice and no fresh data #256

Open mborsetti opened 3 years ago

mborsetti commented 3 years ago

Similar to #242:

(1) Macao seems to be defined twice: as a country and as a China sub region.

(2) Neither has fresh data:

SELECT
  date, new_confirmed, new_deceased, cumulative_confirmed, cumulative_deceased
FROM
  `bigquery-public-data.covid19_open_data.covid19_open_data` 
WHERE
  country_code = "MO"
ORDER BY
  date DESC
LIMIT 100

returns only null, while

SELECT
  date, new_confirmed, new_deceased, cumulative_confirmed, cumulative_deceased
FROM
  `bigquery-public-data.covid19_open_data.covid19_open_data` 
WHERE
  location_key = "CN_MO"
ORDER BY
  date DESC
LIMIT 100

returns null for any date NOT = 2020-07-19

owahltinez commented 3 years ago

Thanks for flagging this issue.

We will disambiguate the duplicate locations and merge them into one.

The numbers have not been updated because, according to the data source, Macau claims to be COVID-free: https://www.ssm.gov.mo/apps1/PreventCOVID-19/en.aspx#clg17458. I believe that if the data does not change, we do not pick up a new date for this particular data source.

mborsetti commented 3 years ago

Thanks.

May I suggest the algorithm be modified? Having a value of null for cumulative_confirmed or cumulative_deceased is not useful, even if (or especially if) these numbers haven't budged in months.

SELECT
  date, cumulative_confirmed, cumulative_deceased
FROM
  `bigquery-public-data.covid19_open_data.covid19_open_data` 
WHERE
  country_code = "MO"
ORDER BY
  date DESC
LIMIT 100

Leads to:

Row | date | cumulative_confirmed | cumulative_deceased
-- | -- | -- | --
1 | 2020-10-22 | null | null |  
2 | 2020-10-21 | null | null |  
3 | 2020-10-20 | null | null |  
4 | 2020-10-19 | null | null |  
5 | 2020-10-18 | null | null |  
6 | 2020-10-17 | null | null |  
7 | 2020-10-16 | null | null |  
8 | 2020-10-15 | null | null |  
9 | 2020-10-14 | null | null |  
10 | 2020-10-13 | null | null |  
11 | 2020-10-12 | null | null |  
12 | 2020-10-11 | null | null |  
13 | 2020-10-10 | null | null |  
14 | 2020-10-09 | null | null |  
15 | 2020-10-08 | null | null |  
16 | 2020-10-07 | null | null |  
17 | 2020-10-06 | null | null |  
18 | 2020-10-05 | null | null |  
19 | 2020-10-04 | null | null |  
20 | 2020-10-03 | null | null |  
21 | 2020-10-02 | null | null |  
22 | 2020-10-01 | null | null |  
23 | 2020-09-30 | null | null |  
24 | 2020-09-29 | null | null |  
25 | 2020-09-28 | null | null |  
26 | 2020-09-27 | null | null |  
27 | 2020-09-26 | null | null |  
28 | 2020-09-25 | null | null |  
29 | 2020-09-24 | null | null |  
30 | 2020-09-23 | null | null |  
31 | 2020-09-22 | null | null |  
32 | 2020-09-21 | null | null |  
33 | 2020-09-20 | null | null |  
34 | 2020-09-19 | null | null |  
35 | 2020-09-18 | null | null |  
36 | 2020-09-17 | null | null |  
37 | 2020-09-16 | null | null |  
38 | 2020-09-15 | null | null |  
39 | 2020-09-14 | null | null |  
40 | 2020-09-13 | null | null |  
41 | 2020-09-12 | null | null |  
42 | 2020-09-11 | null | null |  
43 | 2020-09-10 | null | null |  
44 | 2020-09-09 | null | null |  
45 | 2020-09-08 | null | null |  
46 | 2020-09-07 | null | null |  
47 | 2020-09-06 | null | null |  
48 | 2020-09-05 | null | null |  
49 | 2020-09-04 | null | null |  
50 | 2020-09-03 | null | null |  
51 | 2020-09-02 | null | null |  
52 | 2020-09-01 | null | null |  
53 | 2020-08-31 | null | null |  
54 | 2020-08-30 | null | null |  
55 | 2020-08-29 | null | null |  
56 | 2020-08-28 | null | null |  
57 | 2020-08-27 | null | null |  
58 | 2020-08-26 | null | null |  
59 | 2020-08-25 | null | null |  
60 | 2020-08-24 | null | null |  
61 | 2020-08-23 | null | null |  
62 | 2020-08-22 | null | null |  
63 | 2020-08-21 | null | null |  
64 | 2020-08-20 | null | null |  
65 | 2020-08-19 | null | null |  
66 | 2020-08-18 | null | null |  
67 | 2020-08-17 | null | null |  
68 | 2020-08-16 | null | null |  
69 | 2020-08-15 | null | null |  
70 | 2020-08-14 | null | null |  
71 | 2020-08-13 | null | null |  
72 | 2020-08-12 | null | null |  
73 | 2020-08-11 | null | null |  
74 | 2020-08-10 | null | null |  
75 | 2020-08-09 | null | null |  
76 | 2020-08-08 | null | null |  
77 | 2020-08-07 | null | null |  
78 | 2020-08-06 | null | null |  
79 | 2020-08-05 | null | null |  
80 | 2020-08-04 | null | null |  
81 | 2020-08-03 | null | null |  
82 | 2020-08-02 | null | null |  
83 | 2020-08-01 | null | null |  
84 | 2020-07-31 | null | null |  
85 | 2020-07-30 | null | null |  
86 | 2020-07-29 | null | null |  
87 | 2020-07-28 | null | null |  
88 | 2020-07-27 | null | null |  
89 | 2020-07-26 | null | null |  
90 | 2020-07-25 | null | null |  
91 | 2020-07-24 | null | null |  
92 | 2020-07-23 | null | null |  
93 | 2020-07-22 | null | null |  
94 | 2020-07-21 | null | null |  
95 | 2020-07-20 | null | null |  
96 | 2020-07-19 | null | null |  
97 | 2020-07-18 | null | null |  
98 | 2020-07-17 | null | null |  
99 | 2020-07-16 | null | null |  
100 | 2020-07-15 | null | null
owahltinez commented 3 years ago

I agree that this is not ideal. Unfortunately the missing rows are happening at the data source. If data is missing at the source, we can't tell if it's because it has not changed, or because it's truly missing.

We'll look for an alternative source for Macau.