covidatlas / li

Next-generation serverless crawler for COVID-19 data
Apache License 2.0
57 stars 33 forks source link

Alameda County, California #378

Closed 1ec5 closed 4 years ago

1ec5 commented 4 years ago

Added a source for Alameda County, California, that uses the following datasets from the county’s open data portal:

Alameda County is one of two California counties that have two local health jurisdictions. The sources above include data from the Berkeley LHJ, Alameda County LHJ, and combined totals. This scraper uses the combined totals for easier comparisons with other counties.

1ec5 commented 4 years ago

I had to make some changes to the framework pagination to make it more general, and to simplify its usage. This change impacted arcgis's functions. Can you update this PR to use the new arcgis.paginated method?

All set.

Also, there is a timeseries-filter.js method that might be useful for you in general, I wrote that b/c the old methods we had for managing timeseries filtering was annoying and sometimes invalid.

I’ve updated #379, #380, and #381 to use timeseries-filter.js, but I’m not whether it’s a good fit here. This scraper joins multiple datasets together that have different start dates. I guess we could call it for all three datasets but catch any errors and only rethrow if all three error out? In principle, we could split the source into three, but that could be confusing since it’s all the same source agency from a user standpoint.

jzohrab commented 4 years ago

Re "In principle, we could split the source into three, but that could be confusing since it’s all the same source agency from a user standpoint." - I think that's the direction we should move in, b/c joining things becomes messy, and having a single source (crawler/scraper) per URL source is very clear. The back end and reporting jobs can handle joining sources. That being said, I don't think we need to block merging this, it's good work, and the splitting apart could be done in the future if we want to do so.

As an example, I created a separate source for US/MO tested data, b/c there was an arcgis feature layer for that. IMO that's easier than trying to join the data in the us/mo/index.js state scraper.

I feel that our src/shared/sources concepts need to be modified somewhat to simplify creating scrapers, but at the moment I'm not sure how to do it! (i.e., not sure what will be crystal-clear and error-avoidant for us to use.)

jzohrab commented 4 years ago

Thanks very much @1ec5, LGTM!