Code4HR / va-circuit-court-search

Virginia Courts Case Information - Statewide Searches ARE Possible
http://vacircuitcourtsearch.com/
MIT License
11 stars 7 forks source link

Migrate to a Relation database #10

Open qwo opened 8 years ago

qwo commented 8 years ago

@bschoenfeld any objections about migrating it from mongo to a relational db? I feel like it could open up easier ways to query the data.

bschoenfeld commented 8 years ago

No objections.

On Wednesday, November 25, 2015, Stanley Zheng notifications@github.com wrote:

@bschoenfeld https://github.com/bschoenfeld any objections about migrating it from mongo to a relational db? I feel like it could open up easier ways to query the data.

— Reply to this email directly or view it on GitHub https://github.com/Code4HR/va-circuit-court-search/issues/10.

ttavenner commented 8 years ago

What is the rough size/quantity of data were are currently working with?

bschoenfeld commented 8 years ago

There's a few ways to look at that. If we are just talking circuit court criminal cases, we've got 110,000 records with detailed data, all from 2014, and those records are about 2.5KB each in Mongo. We've got 7,000,000 records with no details (just name, offense, and case number) which are 0.3 KB each. If we got all details for all 7M cases, we could be looking at 15-20 GB of data.

On Wed, Dec 9, 2015 at 6:44 PM, Tommy Tavenner notifications@github.com wrote:

What is the rough size/quantity of data were are currently working with?

— Reply to this email directly or view it on GitHub https://github.com/Code4HR/va-circuit-court-search/issues/10#issuecomment-163438813 .

ttavenner commented 8 years ago

Last Wednesday I started building out a server on Digital Ocean that could host the data and the API, if that is amenable to you. We should have plenty of space to get started and we can resize if need be. The data would be stored in Postgres and the API would be Nginx and Hapi.js. We can probably expose the DB externally so your scraper could write directly to it.