datamade / court-scrapers

MIT License
2 stars 0 forks source link

Optimize court call scrape #52

Open antidipyramid opened 3 months ago

antidipyramid commented 3 months ago

Since we started all fetching calendar values while scraping court calls, the scrape has slow down to the point where we're unable to scrape all available court calls in under 6 hours.

We could try a some things to make the scrapes more efficient:

  1. Avoiding duplicate calendar requests-- on the results page, there are usually at least two court calls listed for a single case. Caching calendar values should reduce the number of case detail requests by at least half.
  2. If (1) isn't enough, we could also limit the dates we're scraping every day. We could try only scraping the court calls for the current or next day.
antidipyramid commented 3 months ago

What do you think @fgregg?