MI-FraunhoferIWM / data2rdf

About A generic pipeline that can be used to map raw data to RDF.
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Merge branch from stream #8

Closed deepukr007 closed 1 year ago

deepukr007 commented 1 year ago

https://gitlab.cc-asp.fraunhofer.de/rdf-pipeline/rdf-pipeline/-/tree/stream-adaptation

yzuuang commented 1 year ago

The major differences are as follows

  1. assume the existance of an ontology file In STREAM, there is an delicate ontology to describe the knowledge in the datasheets. Related entities:
    • data2rdf.annotation_pipeline.AnnotationPipeline.__init__
    • data2rdf.cli.abox_conversion.run_abox_pipeline_for_folder where an optional parameter "ontology_file" was added.

Although it would be helpful in writing SPARQL queries, an ontology is not generally developped in projects. Thus this part should not be merged.

  1. the connection between data instances and its row values is hard coded. The connection is typically ?subj emmo:hasSymbolData ?obj and ?subj emmo:hasQuantityValue/emmo:hasNumericData ?obj for symbolic and numerical data respectively. Related entities:
    • data2rdf.rdf_generation.RDFGenerator At the time, both branches are using hard code and the differences are mainly a direct result of whether the ontology file exists (as mentioned in the above point).

However, it was agreed at some point that we make use of the OntoPanel, which is a package built upon the chowlk, to get rid of this type of hard code. And it is said that it was once implemented in the dev repo but is missing in the current release.

  1. uses the newer EMMO prefix bindings The EMMO ontology has unified its prefixes as <http://emmo.info/emmo#>. This was adapted in the stream branch. While in the main branch, the ununified prefixes, such as <http://emmo.info/emmo/middle/math#> and <http://emmo.info/emmo/middle/metrology#>, are used. Related entities:
    • data2rdf.annotation_confs.annotations

The ununified prefixes are outdated. However, it is said again that this was once implemented in the dev repo but is missing in the current release.

  1. table data in the RDF instead of a separate hdf5 file

This is a very specific requirements in STREAM and can lead to a large performance decrease. Thus this part should not be merged.

yoavnash commented 1 year ago

Done