everypolitician / scraped_page_archive

Create an archive of HTML pages scraped by a Ruby scraper
MIT License
1 stars 0 forks source link

Don't re-clone repo on each request #53

Closed chrismytton closed 8 years ago

chrismytton commented 8 years ago

This makes it so we only create one instance of ScrapedPageArchive::GitStorage and therefore only have to clone the git repo once, rather than once per request.

This should give a fairly significant speed boost to existing scrapers.

Fixes #52

chrismytton commented 8 years ago

@tmtmtmtm 👀

chrismytton commented 8 years ago

It seems a little odd that each adapter needs to add an identical method to do this, and that each later one would need to remember to do it, rather than having something just take care of it for them. If that's not easy to do with the current architecture, then that's perhaps another thing to think about when teasing parts of this out…

So a slightly tongue-biting acceptance for expediency if this is going to tidied up RSN anyway. Though it would be nice to at least have a ticket describing what that might look like (e.g. a base class for adapters that shows which methods must be provided, and which are taken care off automatically)

Yeah, I think most of this should come out of the changes I'm going to make as we tease bits apart, but it would certainly be nice to reduce the boilerplate in these and future adapters. I've opened an issue to track this here - https://github.com/everypolitician/scraped_page_archive/issues/54