covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 #353

Closed nateyoder closed 4 years ago

nateyoder commented 4 years ago
"2020-07-11": {
        "cases": "22",
        "deaths": "0",
        "recovered": "2196,397,230",
        "tested": "2,022",
        "growthFactor": 1
      },
      "2020-07-12": {
        "cases": "22",
        "deaths": "0",
        "recovered": "2196,540,222",
        "tested": "2,022",
        "growthFactor": 1
      },
      "2020-07-13": {
        "cases": "22",
        "deaths": "0",
        "recovered": "2196,669,879",
        "tested": "2,048",
        "growthFactor": 1
      },
      "2020-07-14": {
        "cases": "22",
        "deaths": "0",
        "recovered": "2196,780,428",
        "tested": "2,082",
        "growthFactor": 1
      },
      "2020-07-15": {
        "cases": "22",
        "deaths": "0",
        "recovered": "2196,884,151",
        "tested": "2,111",
        "growthFactor": 1
      },
      "2020-07-16": {
        "cases": "23",
        "deaths": "0",
        "recovered": "2197,016,851",
        "tested": "2,134",
        "growthFactor": 1.05
      },
      "2020-07-17": {
        "cases": "23",
        "deaths": "0",
        "recovered": "2197,154,840",
        "tested": "2,140",
        "growthFactor": 1
      },
      "2020-07-18": {
        "cases": "23",
        "deaths": "0",
        "recovered": "2197,154,840",
        "tested": "2,140",
        "growthFactor": 1
      },
      "2020-07-19": {
        "cases": "23",
        "deaths": "0",
        "recovered": "2197,377,183",
        "tested": "2,155",
        "growthFactor": 1
      },
      "2020-07-20": {
        "cases": "23",
        "deaths": "0",
        "recovered": "2197,377,183",
        "tested": "2,155",
        "growthFactor": 1
      },
      "2020-07-21": {
        "cases": "23",
        "deaths": "0",
        "recovered": "2197,702,075",
        "tested": "2,370",
        "growthFactor": 1
      },
      "2020-07-22": {
        "cases": "23",
        "deaths": "0",
        "recovered": "2197,811,127",
        "tested": "2,412",
        "growthFactor": 1
      },
      "2020-07-23": {
        "cases": "24",
        "deaths": "0",
        "recovered": "2227,948,513",
        "tested": "2,472",
        "growthFactor": 1.04
      },
      "2020-07-24": {
        "cases": "24",
        "deaths": "0",
        "recovered": "2228,121,700",
        "tested": "2,597",
        "growthFactor": 1
      },
      "2020-07-25": {
        "cases": "24",
        "deaths": "0",
        "recovered": "2228,292,311",
        "tested": "2,934",
        "growthFactor": 1
      },
      "2020-07-26": {
        "cases": "24",
        "deaths": "0",
        "recovered": "2228,292,311",
        "tested": "2,934",
        "growthFactor": 1
      },
      "2020-07-27": {
        "cases": "24",
        "deaths": "0",
        "recovered": "2228,292,311",
        "tested": "2,934",
        "growthFactor": 1
      },
      "2020-07-28": {
        "cases": 24,
        "deaths": 0,
        "recovered": 22,
        "growthFactor": 1
      }
jzohrab commented 4 years ago

Thanks @nateyoder , checking.

stevenganz commented 4 years ago

Also, rows for St. Lucia including 73,568 are similarly wrong ("2196,397,230") in timeseries.csv.

jzohrab commented 4 years ago

Hm, the source for St Lucia (in src/shared/sources/lc) is hitting https://www.covid19response.lc/ ... looking at that page, everything looks fine, but running the source locally is returning trash:

MacBook-Air:li jeff$ ./start --crawl lc
MacBook-Air:li jeff$ ./start --scrape lc

┌─────────┬────────────┬──────────────┬───────┬────────┬─────────┬────────────────┐
│ (index) │ locationID │     date     │ cases │ deaths │ tested  │   recovered    │
├─────────┼────────────┼──────────────┼───────┼────────┼─────────┼────────────────┤
│    0    │ 'iso1:lc'  │ '2020-08-03' │ '25'  │  '0'   │ '3,548' │ '2229,630,598' │
└─────────┴────────────┴──────────────┴───────┴────────┴─────────┴────────────────┘

Looking into a fix, cheers! jz

stevenganz commented 4 years ago

That's a lot of recoveries for 25 cases!

jzohrab commented 4 years ago

The code is erroneously joining the numbers from "Number of COVID-19 Repatriations from Saint Lucia" (2), "Number of Persons Recovered from COVID-19" (22), and "Number of Confirmed Cases of COVID-19 in the Americas" (9,630,598) ... :-(

The code would have been correct at one point, but the source data page has changed. Needs a fix to select the right stuff.

jzohrab commented 4 years ago

I've merged a fix into master (ref https://github.com/covidatlas/li/pull/372), but this won't show up fixed in prod until I fix https://github.com/covidatlas/li/issues/368 -- we need to change how we launch code :-)

Thanks, I'll keep this issue and see that it's launched.

stevenganz commented 4 years ago

Thanks for the quick action!

Steve

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Comment" comment@noreply.github.com Sent: Monday, August 3, 2020 6:58:12 PM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

I've merged a fix into master (ref [ https://github.com/covidatlas/li/pull/372 | #372 ] ), but this won't show up fixed in prod until I fix [ https://github.com/covidatlas/li/issues/368 | #368 ] -- we need to change how we launch code :-)

Thanks, I'll keep this issue and see that it's launched.

— You are receiving this because you commented. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-668333455 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK3YOFBKCB77TVGAPY4LR65TLJANCNFSM4PJBGKPA | unsubscribe ] .

jzohrab commented 4 years ago

This one was simple ... there are tons of other issues requiring more head scratching. Thanks to you and @nateyoder for the check. We really need to get data quality monitoring in place!!! 👋

jzohrab commented 4 years ago

Should be in production now, will keep this open until it's checked there. @nateyoder or @stevenganz , if this clears up, please close the ticket.

stevenganz commented 4 years ago

Hi JZ,

Any idea when this will be reflected in the downloadable files (timeseries.csv.zip)?

Steve

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Mention" mention@noreply.github.com Sent: Tuesday, August 4, 2020 5:39:45 AM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

Should be in production now, will keep this open until it's checked there. [ https://github.com/nateyoder | @nateyoder ] or [ https://github.com/stevenganz | @stevenganz ] , if this clears up, please close the ticket.

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-668571774 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK3ZPKYNTEA6BRWV6HM3R676RDANCNFSM4PJBGKPA | unsubscribe ] .

jzohrab commented 4 years ago

It’s not there now? Should be!

El El lun, ago. 10, 2020 a la(s) 2:57 p. m., Steven Ganz < notifications@github.com> escribió:

Hi JZ,

Any idea when this will be reflected in the downloadable files (timeseries.csv.zip)?

Steve

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Mention" < mention@noreply.github.com> Sent: Tuesday, August 4, 2020 5:39:45 AM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

Should be in production now, will keep this open until it's checked there. [ https://github.com/nateyoder | @nateyoder ] or [ https://github.com/stevenganz | @stevenganz ] , if this clears up, please close the ticket.

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-668571774 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK3ZPKYNTEA6BRWV6HM3R676RDANCNFSM4PJBGKPA | unsubscribe ] .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/covidatlas/li/issues/353#issuecomment-671529751, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPWDIWDGZ3KRUUVVWBUR3SAA7LHANCNFSM4PJBGKPA .

stevenganz commented 4 years ago

I'm still getting bad values for St. Lucia. Please check yourself.

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Mention" mention@noreply.github.com Sent: Monday, August 10, 2020 12:27:57 PM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

It’s not there now? Should be!

El El lun, ago. 10, 2020 a la(s) 2:57 p. m., Steven Ganz < notifications@github.com> escribió:

Hi JZ,

Any idea when this will be reflected in the downloadable files (timeseries.csv.zip)?

Steve

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Mention" < mention@noreply.github.com> Sent: Tuesday, August 4, 2020 5:39:45 AM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

Should be in production now, will keep this open until it's checked there. [ https://github.com/nateyoder | @nateyoder ] or [ https://github.com/stevenganz | @stevenganz ] , if this clears up, please close the ticket.

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-668571774 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK3ZPKYNTEA6BRWV6HM3R676RDANCNFSM4PJBGKPA | unsubscribe ] .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/covidatlas/li/issues/353#issuecomment-671529751, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPWDIWDGZ3KRUUVVWBUR3SAA7LHANCNFSM4PJBGKPA .

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-671543578 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK34OIQ3JLH7DXRFDZETSABC33ANCNFSM4PJBGKPA | unsubscribe ] .

jzohrab commented 4 years ago

I’ll check generation status and the data when I have a better connection. Thx!

El El lun, ago. 10, 2020 a la(s) 3:47 p. m., Steven Ganz < notifications@github.com> escribió:

I'm still getting bad values for St. Lucia. Please check yourself.

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Mention" < mention@noreply.github.com> Sent: Monday, August 10, 2020 12:27:57 PM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

It’s not there now? Should be!

El El lun, ago. 10, 2020 a la(s) 2:57 p. m., Steven Ganz < notifications@github.com> escribió:

Hi JZ,

Any idea when this will be reflected in the downloadable files (timeseries.csv.zip)?

Steve

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Mention" < mention@noreply.github.com> Sent: Tuesday, August 4, 2020 5:39:45 AM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

Should be in production now, will keep this open until it's checked there. [ https://github.com/nateyoder | @nateyoder ] or [ https://github.com/stevenganz | @stevenganz ] , if this clears up, please close the ticket.

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-668571774 | view it on GitHub ] , or [

https://github.com/notifications/unsubscribe-auth/ABBTK3ZPKYNTEA6BRWV6HM3R676RDANCNFSM4PJBGKPA | unsubscribe ] .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/covidatlas/li/issues/353#issuecomment-671529751, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAMPWDIWDGZ3KRUUVVWBUR3SAA7LHANCNFSM4PJBGKPA>

.

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-671543578 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK34OIQ3JLH7DXRFDZETSABC33ANCNFSM4PJBGKPA | unsubscribe ] .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/covidatlas/li/issues/353#issuecomment-671581573, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPWDNQJD5LL7LX6WELNLTSABMFHANCNFSM4PJBGKPA .

jzohrab commented 4 years ago

I didn't check the report, but I edited the data in prod manually.

We don't yet have a process to correct historical data -- it's not as straightforward as one would think. There's an open issue.

@stevenganz LMK if this is resolved. I'm not easily able to check due to my extremely limited data bandwidth. Cheers! z

stevenganz commented 4 years ago

Thanks and sorry for the delay in responding. The numbers look more reasonable now, but still contain commas for dates prior to 8/4. I'd appreciate if you could remove those.

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Assign" assign@noreply.github.com Sent: Tuesday, August 11, 2020 7:17:42 AM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

I didn't check the report, but I edited the data in prod manually.

We don't yet have a process to correct historical data -- it's not as straightforward as one would think. There's an open issue.

[ https://github.com/stevenganz | @stevenganz ] LMK if this is resolved. I'm not easily able to check due to my extremely limited data bandwidth. Cheers! z

— You are receiving this because you were assigned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-671974326 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK34PMR6KUHOILFQVNX3SAFHINANCNFSM4PJBGKPA | unsubscribe ] .

jzohrab commented 4 years ago

Strange, I'm sure I edited everything! Not saying you're wrong, am saying that perhaps it was subsequently updated ... not sure why though. Double-checking prod tables now. Thanks for checking @stevenganz

jzohrab commented 4 years ago

Ah, I was only looking at the terrible "recovered" numbers. I'm removing commas in "tested" now, assuming that's the field you meant.

jzohrab commented 4 years ago

OK, hopefully that's all the commas you can see for St Lucia, @stevenganz . In a day-ish, let me know if all is as expected. Thanks again, jz

stevenganz commented 4 years ago

Looks good now, @jzohrab. Thanks! Yes, the commas were in the tested column -- sorry for any confusion.

From: "JZ" notifications@github.com To: "covidatlas" li@noreply.github.com Cc: "Steven Ganz" steven.ganz@genetius.com, "Mention" mention@noreply.github.com Sent: Sunday, August 16, 2020 1:16:33 PM Subject: Re: [covidatlas/li] Saint Lucia data has a comma separated string for timeseries-byLocation.json on 07/27 (#353)

OK, hopefully that's all the commas you can see for St Lucia, [ https://github.com/stevenganz | @stevenganz ] . In a day-ish, let me know if all is as expected. Thanks again, jz

— You are receiving this because you were mentioned. Reply to this email directly, [ https://github.com/covidatlas/li/issues/353#issuecomment-674572632 | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/ABBTK37R4YIHAZ4ZQRMDEXLSBA5CDANCNFSM4PJBGKPA | unsubscribe ] .

jzohrab commented 4 years ago

Nice, thanks for checking!