QuantifiedSelfless / gulper

data ingestion
4 stars 1 forks source link

Text scrapers #5

Closed wannabeCitizen closed 8 years ago

wannabeCitizen commented 8 years ago

making scrapers to look into people's text they've written online

mynameisfiber commented 8 years ago

@wannabeCitizen when do you think this'll be ready for review?

wannabeCitizen commented 8 years ago

I mentioned this on the Slack, but it's basically ready. I was just hoping @peymanmortazavi would test it before merging or anything. Feel free to start reviewing or testing it now though.

wannabeCitizen commented 8 years ago

@mynameisfiber @peymanmortazavi this is ready for review.. We still need to make decisions about how to fail more gracefully. Right now we are wrapping all scrapers in a try/except block and if an exception is thrown it loses all partially scraped data. Is there a way to save a partial scrape before an exception?

Also, I quickly killed our rate limit for Twitter, which means we need to figure out what we want to do about hitting rate limits because I'm certain we are going to hit them. Should use delays to prevent this?

wannabeCitizen commented 8 years ago

@peymanmortazavi and I tested this. Things look clean to merge in. We may want to do some final testing before deployment, but these all work and get data as expected.