Closed sglyon closed 4 years ago
Hi @sglyon - thanks for the PR! Have been busy over here so haven't had time to get to it.
The callback stuff looks to be mostly ok. We're moving to a new repo (Li) in this same org, which has a completely different crawl and scrape model.
Questions and Answers:
Is adding this additional exported function to the fetchlib ok?
It should be, but there are few changes to this so I'll need to check!
How could/should we handle cache for this scraper? I'm not sure on the details of how caching is handled here. For now I am completely ignoring it.
Yes, good question. Ultimately your page does const data = await page.waitForXPath("//div[@aria-label='Grid']").then(getDataFromPivotTable);
, but this is totally different from our usual methods. Tough stuff!
I only return those two current hospital bed usage data points, I don't have more fundamental results like cases, deaths, reported, etc. Those are aggregate level TX scraper. Is there a strategy for merging multiple scraper outputs so we have all the info?
Normally, we actually fetch things within one scraper and combine them in there. Ideally, the data would be normalized when inserted into the db or document, and then some other process would join them ... but that's not how it's done (yet)!
Great work, this is a non-trivial problem.
I opened https://github.com/valorumdata/coronadatascraper/pull/1 into this branch from my repo, it has some substantial revisions, but respects caching etc which we must follow. Take a look and let me know, thank you!
Closing in favor of https://github.com/covidatlas/coronadatascraper/pull/1040, which updates the work you started to follow this project's conventions.
Thank you for the PR! I hope 1040 gets merged soon so we can see how it behaves in prod ... at the moment I'm not sure how it will work.
Thanks @jzohrab -- hopefully future contributions don't require so much hands on help from the core team!
Summary
Adds hopsital and ICU bed usage for harris county TX.
This probably needs a bit more work, but I wanted to touch base with maintainers here to see if I'm on the right direction.
What it does:
Page
instancehospitalized_current
andicu_current
forprocess.env.SCRAPE_DATE
. The scraper has data for previous two weeks, so it is a timeseriesThings I'm a little uncertain on are:
fetch
lib ok?Thanks!