eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
364 stars 164 forks source link

Investigate RDF-Star support for LMDB store #3723

Open kenwenzel opened 2 years ago

kenwenzel commented 2 years ago

Problem description

The LMDB store does not yet support values of type org.eclipse.rdf4j.model.Triple. A simple solution could be to handle those triples like other RDF values and store them within the value store.

Preferred solution

No response

Are you interested in contributing a solution yourself?

Perhaps?

Alternatives you've considered

No response

Anything else?

No response

kenwenzel commented 1 year ago

I thought about this a bit:

kenwenzel commented 9 months ago

OK, here is concrete plan:

All in all this is a breaking change to the storage formats of value store and triple store.

nguyenm100 commented 7 months ago

Hi, 2 questions:

  1. is this work slated for 5.0?
  2. when is 5.0 targetted for release? @hmottestad
hmottestad commented 7 months ago

Hi, 2 questions:

  1. is this work slated for 5.0?

  2. when is 5.0 targetted for release? @hmottestad

This isn't planned for 5.0 as far as I know. 5.0 is somewhat delayed. It's taken much longer to iron out bugs and compatibility issues than I had expected. There are still one or more things I need to look into before I can publish the last milestone build.

nguyenm100 commented 7 months ago

This isn't planned for 5.0 as far as I know. 5.0 is somewhat delayed. It's taken much longer to iron out bugs and compatibility issues than I had expected. There are still one or more things I need to look into before I can publish the last milestone build.

understood. do we have rough timelines for 5.0 release? q3/q4?

hmottestad commented 7 months ago

Not going to make any promises.

kenwenzel commented 7 months ago

RDF-star support requires a rework of the ID encoding in the value store which would be a breaking change. When starting this I would try to create a future-proof extendable ID-scheme.

nguyenm100 commented 7 months ago

@kenwenzel can you share more info on your design to 1) get lmdb out of experimental and 2) add rdfstar? For (2), perhaps (1) work can position rdfstar as an additive later w/o breaking change.

We were going down the track of rocksdb but are looking at lmdb bc you've already integrated it with rdf4j so perhaps we can assist with it getting to prod.

The other thought is perhaps getting it to prod in 4x with uncertainty of 5x release even if not backward compat given it's still in experimental currently? What are your thoughts around that? Tx

kenwenzel commented 7 months ago

@nguyenm100 Feature-wise the store is on par with NativeStore and additionally supports deletion of values. It would help if you could test it in a setting that is comparable to your production environment. One critical feature that would simplify future extensions is a better ID scheme. I've also thought about inlining values like Jena TDB2 does: https://github.com/eclipse-rdf4j/rdf4j/issues/4774

We could adopt a scheme that is comparable to Jena's. An important difference is that we use varints to encode the IDs and therefore we need to modify the scheme in a way that it always leads to small integer values. (flags and types need to be added in the lower bits, not in the higher ones)

nguyenm100 commented 6 months ago

@kenwenzel Hey Ken, we will definitely run lmdb through it's paces over the next quarter or so. Wanted to revisit the idea again with you about taking LMDB out of experimental status in 4.x as opposed to 5.x given that there doesn't seem to be a definitive timeframe on 5.x atm. are you open to that?

kenwenzel commented 6 months ago

Hi @nguyenm100 ,

my opinion is that we can take out LMDB of experimental status after having at least the following issues fixed:

The first one is a breaking change to the data format and therefore I'm not sure if this could be backported to 4.x.x Especially the last one will need some careful investigation as you wont want your productive system to fail if a query gets cancelled due to a time limit.

Is it possible for you to start with the NativeStore and then switch to the LmdbStore at some later point in time? If not then what is your motivation for using the LmdbStore?

nguyenm100 commented 6 months ago

Hey @kenwenzel, we're looking at lmdbstore for the speed and large dataset support. per: https://rdf4j.org/javadoc/3.4.3/org/eclipse/rdf4j/sail/nativerdf/NativeStore.html only supports up to 100m triples.

Agree #4950 would be a backward breaking change, but my thought was that lmdb is still in experimental and not yet released so backward compat needn't be guaranteed. I make this judgement based on the fact that 5.x doesn't have a concrete release date atm. Also, moving to 5.x will introduce a lot of risk outside of just lmdb.