SocialHarvest / harvester

The Social Harvest server that exposes an API and harvests data from the web to be analyzed.
Other
111 stars 44 forks source link

Move external data to memory mapped files #72

Open tmaiaroto opened 9 years ago

tmaiaroto commented 9 years ago

This will be a change that benefits packages in other repos as well, but Social Harvest is prompting it.

Geocoding and sentiment analysis both need to use some data sets. These are pulled from S3 right now (too big to store in GitHub) upon running Social Harvest (if the files don't exist). The problem is they are rather large and therefore require a good deal of RAM to load and work with.

By using memory mapped files (I'm looking at boltdb), it should work on a server of any size...But work faster when there's more RAM available of course. Despite the slow performance on smaller servers, it still may allow Social Harvest to run and run fast enough for many use cases.

One of the goals of Social Harvest is to bring big data in social media analytics down to an affordable and obtainable goal. So this is important, though for the time being it is also easy enough to just run Social Harvest on a server with 1 or 2GB of RAM rather than 256MB or 512MB. My goal is to make the minimum requirement 512MB of RAM. I would like Social Harvest to run on an EC2 small instance. A micro instance may be asking too much.