Review apps: avoid scraping on startup every time the PR changes

callumlocke commented 8 years ago

Currently, a review app runs the scraper every time it starts up. This is annoying if you add another small commit to the PR (e.g. a small style tweak) and have to wait for the scraper to finish before you can see the review app again.

Short term solution: modify the conditional 'SCRAPE_ON_STARTUP' logic in server/index.js, so it first checks the lastupdated date (and handles the case of the table not existing) before running the scraper. If it's run recently, don't bother.

Possibly better solution: break out the scraper into its own microservice, and change polltracker to query a JSON endpoint on that new microservice to get everything it needs. Then this app would be much faster and lighter and single-focus, and we wouldn't be migrating databases and re-scraping RCP for every little website change.

kavanagh commented 8 years ago

At the moment I think making the scraper faster is the most expedient approach.

There are costs to splitting the app too, a few off the top of my head:

such an abstraction can cause siloed knowledge or make the code harder to reason about
slow down important schema/model changes that touch all tiers of the app
- eg the move from 2-way to 4-way averages would have taken far longer to complete if the app was split.
tricker sequencing of releases (that may be tricky on election night)
more I/O (network) which might lead to more complex caching and the need to be fault tolerant
the refactor job alone we take more time than we have.

kavanagh commented 8 years ago

If it really does get slow again I think a mock data set could be an option for review apps. There could be away to opt-into a mock data set or let the scraper run

ft-interactive / us-elections-polltracker

Review apps: avoid scraping on startup every time the PR changes #177