curationexperts / chf-sufia

sufia-based hydra app
Other
0 stars 0 forks source link

Ruby RDF::Repository Decision #3

Closed no-reply closed 7 years ago

no-reply commented 7 years ago

We need to decide on a Repository (triplestore) implementation that is suitable for the project and ensure that it is ready to scale to project needs.

Likely options, in order of expected viability, are:

no-reply commented 7 years ago

I've done some preliminary work on benchmarking and load testing. Preparation involved shipping a 0.1.0 of the rdf-benchmark gem https://github.com/ruby-rdf/rdf-benchmark.

I should be able to finish up this work tomorrow and have enough information for us to make this decision in the first meeting with CHF.

Note that this is very basic load testing work intended to ensure we make the best decision available to us in the current moment. The project quote set "Production Hardening" aside as a future phase. I'm trying to keep this work to ~3-5 hours overall.

hackartisan commented 7 years ago

Scale: our rough record completion rate (objects that make it through cataloging) is averaging 200/month. Hopefully that will go up but i can't imagine it increasing by an order of magnitude. Currently we have close to 3000 objects, about half of them fully cataloged.

I think for this decision the most important factors are:

  1. what will be compatible with the tools we're using.
  2. a choice that we can reasonably feel good about recommending to others as a sensible default.
  3. what will be a good fit for our actual production needs, since once we make this choice it will take effort to use something else in production / rewrite our build scripts, etc.
no-reply commented 7 years ago

I think the way to go here is to move forward with Marmotta for the moment. I'm pretty confident that Marmotta & Blazegraph will be totally interchangeable late in the project, and marmotta needs less immediate work on its ruby gem and on deployment.

Closing this with decision: Marmotta

hackartisan commented 7 years ago

👍 @no-reply also notes that Blazegraph may scale better, which is why we may eventually want to switch out. Potential scaling issues depend on the caching strategy.