covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Lost "tested" data for St. Lawrence County, NY (fips 36089) #472

Closed jzohrab closed 3 years ago

jzohrab commented 3 years ago

Original issue https://github.com/covidatlas/coronadatascraper/issues/1062, transferred here on Wednesday Jul 15, 2020 at 19:55 GMT


We updated our CDS data at Sat Jun 27 17:41:23 UTC 2020 and lost all values from the "tested" column for St. Lawrence County, New York.

jzohrab commented 3 years ago

(Transferred comment)

Hi @mikelehen - which file were you using, downloaded from which URL? Cheers, jz

jzohrab commented 3 years ago

(Transferred comment)

@jzohrab

We download https://coronadatascraper.com/timeseries.csv regularly. The copy we pulled on June 25th had the data. The copy we pulled on June 27th did not.

Specifically:

First CSV (note 16266.0 for "tested" column):

"St. Lawrence County, New York, United States",county,,St. Lawrence County,New York,United States,107740.0,44.533,-75.193,https://health.data.ny.gov/api/views/xdss-u53e/rows.csv?accessType=DOWNLOAD,county,America/New_York,217.0,,,,16266.0,,,,,,1.0,2020-06-24

Second CSV:

"St. Lawrence County, New York, United States",county,,St. Lawrence County,New York,United States,107740.0,44.533,-75.193,https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv,county,America/New_York,217.0,3.0,,,,,,,,,1.0,2020-06-24

I guess the problem is that you switched from the health.data.ny.gov data source to NYT for some reason? Do you know why?

jzohrab commented 3 years ago

(Transferred comment)

Hi @mikelehen , thanks for the detail.

It's possible that one of the sources wasn't available, or this project failed to reach it, on that date. If so, the CDS report would have used the data returned by another source. The selection is determined by the source's priority. Both of those sources are "timeseries" sources, meaning they return full histories of data, so it's possible that we'll have the data again in a future run as you expect it.

Perhaps check a newer version of the report, and let me know. Thanks, z

(By the way, we're gradually switching this project over to a new one called Li, and the reports there are generated differently, though they should have all of the data points that you need. Please check out the issue in that project: https://github.com/covidatlas/li/issues/284. Right now, reports are in "beta", but I'll likely be promoting them to a proper production release soon. I'll add you to that issue for visibility. Join us on our Slack if you have any questions. Cheers!)

jzohrab commented 3 years ago

(Transferred comment)

@jzohrab Thanks. Yeah, we're intending to start poking at Li soon, but all of our existing pipeline is using the existing CDS data. To be clear, I work with https://covidactnow.org/ which uses CDS for much of our county data.

To be 100% clear, CDS lost the data ~6/24 and has not gotten it back in the past 3 weeks. AFAICT https://health.data.ny.gov/api/views/xdss-u53e/rows.csv?accessType=DOWNLOAD still has the data. So I don't know why CDS has switched sources to one that has less data.

jzohrab commented 3 years ago

(Transferred comment)

Hi ML, thanks for the note. I’ll take a peek when I’m able.

We wouldn’t have switched sources ... this code hasn’t changed in a long while as we are working on moving all to Li. Possibly one source became inactive and the reporting code switched to the next available source ... I can’t say why.

We should use the source you mentioned.

Cheers and thanks, jz

El El jue, jul. 16, 2020 a la(s) 9:27 p. m., Michael Lehenbauer < notifications@github.com> escribió:

@jzohrab https://github.com/jzohrab Thanks. Yeah, we're intending to start poking at Li soon, but all of our existing pipeline is using the existing CDS data. To be clear, I work with https://covidactnow.org/ which uses CDS for much of our county data.

To be 100% clear, CDS lost the data ~6/24 and has not gotten it back in the past 3 weeks. AFAICT https://health.data.ny.gov/api/views/xdss-u53e/rows.csv?accessType=DOWNLOAD still has the data. So I don't know why CDS has switched sources to one that has less data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/covidatlas/coronadatascraper/issues/1062#issuecomment-659774415, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMPWDJCDBZRNCJZWOYWGZLR36SI5ANCNFSM4O24DLCA .

jzohrab commented 3 years ago

Hi @mikelehen, following up (this issue was in the old project, at https://github.com/covidatlas/coronadatascraper/issues/1062). Is this still an issue?

I just ran the us/ny scraper, and got the following data for St. Lawrence:

Local cache hit for: us-ny / 2020-08-09
scraping data from 2020-08-08
┌─────────┬─────────────────────────────────┬──────────────┬────────┬─────────┐
│ (index) │           locationID            │     date     │ cases  │ tested  │
├─────────┼─────────────────────────────────┼──────────────┼────────┼─────────┤
│    0    │ 'iso1:us#iso2:us-ny#fips:36001' │ '2020-08-08' │  2595  │  82889  │
...
│   49    │ 'iso1:us#iso2:us-ny#fips:36089' │ '2020-08-08' │  263   │  30258  │
...

Please check the reports at https://covidatlas.com/data and if it's good, you can close this issue. Thank you! jz

mikelehen commented 3 years ago

Thanks @jzohrab. Yeah, it seems to have gotten fixed at some point. I can't close the issue (only comment) since it was migrated, but please go ahead.

jzohrab commented 3 years ago

Thank you!