blazegraph / database

Blazegraph High Performance Graph Database
GNU General Public License v2.0
872 stars 170 forks source link

Virtual Graphs support in blazegraph #207

Closed damooo closed 2 years ago

damooo commented 2 years ago

Namaste, thanks for blazegraph. Blazegraph seems supporting Virtual Graphs, as described in that blazegraph-wiki link. Two qns about the feature:

  1. does a virtual graph scale to millions of constituent named graphs?
  2. Is there any parallel feature to it in aws-neptune? so that we can model with confidence for future.

Thanks again, @beebs-systap , @thompsonbry and all for your work.

thompsonbry commented 2 years ago

Virtual graphs are based on assertions in the underlying graph per https://github.com/blazegraph/database/wiki/VirtualGraphs. The scaling in the data should be fine. The resolution of the virtual graph identifier to the set of named graph identifiers is handled here:

- https://github.com/blazegraph/database/blob/3127706f0b6504838daae226b9158840d2df1744/bigdata-core/bigdata-rdf/src/java/com/bigdata/rdf/sparql/ast/eval/ASTDeferredIVResolution.java#L515

That is called as part of bulk resolution of lexical forms (URIs, strings, etc.) to internal identifiers. This resolution phase is efficient. However, you do need to understand that a virtual graph having millions of named graphs will require reading millions of entries of the form (:vg bd:virtualGraph :g1).

What the platform does not do is figure out whether it would be more efficient to run the query and then filter based on the named graphs in the virtual graph. Instead, it always resolves the virtual graph to the set of named graphs and then runs the query with that dataset.

That said, virtual graph support is not part of Amazon Neptune at this time. It is an interesting feature but we have not had much feedback from people using the feature in Blazegraph or otherwise asking for the feature. If you have feedback on using virtual graphs, please share.

Thanks, Bryan

On Thu, Aug 12, 2021 at 8:36 AM దామోదర @.***> wrote:

Namaste, thanks for blazegraph. Blazegraph seems supporting Virtual Graphs https://github.com/blazegraph/database/wiki/VirtualGraphs, as described in that blazegraph-wiki link. Two qns about the feature:

  1. does a virtual graph scale to millions of constituent named graphs?
    1. Is there any parallel feature to it in aws-neptune? so that we can model with confidence for future.

Thanks again, @beebs-systap https://github.com/beebs-systap , @thompsonbry https://github.com/thompsonbry and all for your work.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YGMAGBFJEOEWXC5URLT4PTBNANCNFSM5CBQGYMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

damooo commented 2 years ago

Thanks for reply @thompsonbry ..

We want to load many number of tiny named graphs, ( like in nano-publications ), And create macro-views on them based on certain-criteria. like trusted-by-an-agency, extracted-from-trusted-sources, assertive-attitude, etc.. And query on these merged-views.

This seems natural way of working with many named-graphs with different provenance, different (propositional/assetive/..) attitudes, or based on different sources.

What the platform does not do is figure out whether it would be more efficient to run the query and then filter based on the named graphs in the virtual graph. Instead, it always resolves the virtual graph to the set of named graphs and then runs the query with that dataset.

Is there any other effective and efficient way to such query over virtual datasets?

Thanks again for your time

thompsonbry commented 2 years ago

I think this feature is a good fit for your use case. The only performance implications would come from large numbers of named graphs in a given virtual graph.

I’d suggest to try it and see.

A nearly exact alternative is to set the named graphs to be queried for each query. The only real difference is whether the information about the virtual graph membership resides in the triple store or in your application. Plus it is more efficient to send the uri of a single virtual graph than a bunch of URLs of the individual named graphs.

This of course also suggests how you could work around the absence of the feature on a different platform.

Bryan

On Thu, Aug 12, 2021 at 12:43 దామోదర @.***> wrote:

Thanks for reply @thompsonbry https://github.com/thompsonbry ..

We want to load many number of tiny named graphs, ( like in nano-publications http://nanopub.org/wordpress/ ), And create macro-views on them based on certain-criteria. like trusted-by-an-agency, extracted-from-trusted-sources, assertive-attitude, etc.. And query on these merged-views.

This seems natural way of working with many named-graphs with different provenance, different (propositional/assetive/..) attitudes, or based on different sources.

What the platform does not do is figure out whether it would be more efficient to run the query and then filter based on the named graphs in the virtual graph. Instead, it always resolves the virtual graph to the set of named graphs and then runs the query with that dataset.

Is there any other effective and efficient way to such query over virtual datasets?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/207#issuecomment-897918750, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YGKFNMQWL3QNAPMOXLT4QP7RANCNFSM5CBQGYMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

damooo commented 2 years ago

Some criteria may include very large number of named-graphs, like those named-graphs trusted through specific web-of-trust. Will still try and see.

Is there any possibility for it to be included in neptune? What else can be aliternatives there otherwise?

Thanks again.

thompsonbry commented 2 years ago

Neptune feature requests need to go through Amazon channels.

The complexity of the feature is not that difficult.

Bryan

On Thu, Aug 12, 2021 at 1:08 PM దామోదర @.***> wrote:

Some criteria may include very large number of named-graphs, like those named-graphs trusted through specific web-of-trust. Will still try and see.

Is there any possibility for it to be included in neptune? What else can be aliternatives there otherwise?

Thanks again.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/207#issuecomment-897933697, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YFKH3EABEJHG3EFLM3T4QS4PANCNFSM5CBQGYMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

damooo commented 2 years ago

Thanks for suggestion and support @thompsonbry .

thompsonbry commented 2 years ago

You might also explore support for this concept via the W3C community group for RDF/SPARQL. That is a good path to develop open community interest in new features and gain interest in having those features supported by various platforms. The community process is generally use case driven, and it sounds like you have a good use case that you might share.

This is also where RDF star is being developed.

Bryan

On Thu, Aug 12, 2021 at 21:31 దామోదర @.***> wrote:

Thanks for suggestion and support @thompsonbry https://github.com/thompsonbry .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blazegraph/database/issues/207#issuecomment-898184241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATW7YAUOBM77T2NPFFPYN3T4SN2RANCNFSM5CBQGYMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .