acoli-repo / fintan-backend

0 stars 4 forks source link

suggestion: you might be interested in SPARQL Anything #3

Open justin2004 opened 2 years ago

justin2004 commented 2 years ago

SPARQL Anything is similar to Tarql but it supports many more formats.

chiarcos commented 2 years ago

Thank you for the suggestion. Indeed, we were considering to add it before our original funding ran out. We keep the issue open as a reminder to ourselves, but no promises as to when this is coming. It might be put higher onto the priorities list if you (or anyone) comes with a concrete workflow that requires it.

chiarcos commented 2 years ago

I see you're involved in SPARQL Anything development. If you want to give us a hand, that might speed up things ;)

cfaeth commented 2 years ago

Hi Justin,

thanks for the suggestion. In fact, Tarql is a bit of a problem for us since it directly uses Jena's low level API which has changed significantly since release 3.11 and is thus incompatible with newer Jena releases. It also does not seem to be actively maintained anymore, so an update is not to be expected anytime soon.

Ultimately, this currently prevents us from updating the Fintan-backend to a newer Jena version.

I can see there has been a lot of effort flowing into SPARQL Anything and it looks extremely promising. I especially like your approach of creating a generic abstraction layer which is applicable to almost any format and thus allows similar query structures across heterogeneous resources.

I am very inclined to add SPARQL Anything for our next release and possibly replace Tarql. However, I have a few questions:

  1. For Fintan compatibility, we would need to write a Java wrapper class which implements the FintanStreamComponent interface. Fintan uses Java Input/OutputStreams for passing data between independent components. From what I can see, SPARQL Anything provides at least a CLI which can read from stdin and outputs data to stdout, so it should be compatible. In case not all of this is part of the CLI, could you please point me to a class or the necessary methods/functions which would be the ideal starting point for building a Fintan wrapper class, i.e. where we can define I/O, configuration options and execute the transformation process?

  2. Fintan allows data segmentation, i.e. RDF datasets can be split into contextual segments like "sentences" for corpora, "lexical entries" for dictionaries etc. We also allow independent Jena models to be passed between components. Some components allow optimized parallel processing of such segment models. Apart from that, we also allow to define textual delimiters to be inserted into serializations, so we can easily recover the independent models from output data. Your new "Slicing" options sound very similar. Is it possible in SPARQL Anything to stream "sliced" data?

  3. I did not have a close look at your source code yet, but from what I get from your documentation, you transform resources into your "Facade-X" RDF abstraction layer (you call this triplification?) and then operate on native Jena models/datasets without relying on their low level API (which caused our aforementioned problems with Tarql), right? What are your long-term plans with this library? How is it funded or motivated among your three main contributors? I primarily ask because many academic projects are third-party-funded for a couple of years and then tend to slowly drift out of maintenance which would be a shame given the sheer amount of formats and pipelines we could support relying on Sparql Anything in Fintan.