SDM-TIB / SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction
https://doi.org/10.5281/zenodo.3872103
Apache License 2.0
107 stars 25 forks source link

[Question] Update/Merge RDF triple or KG incrementally #60

Open tangyong opened 3 years ago

tangyong commented 3 years ago

From the paper: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs, we seem to find SDM-RDFizer has such a capability for updating KG incrementally while new data is coming (eg. in streaming way...)

image

Concretely, we have built a KG using SDM-RDFizer from multi-datasources while facing an IoT case, and IoT data is still coming by collecting a lot of sensors and transfering message middleware (eg.kafka), then, we need to constly update the previous KG to reflect the recent data change, however, we want not to build the KG from scratch.

Instead, we want to update the previous built KG incrementally to add/update the data and reach the real time effective as soon as possible.

So, I want to ask the team whether supporting the above case or not?

Thanks!

dachafra commented 3 years ago

Dear @tangyong, At this moment we do not support streaming construction of the KGs, the incremental generation of the KG means that we do not maintain all the triples generated in memory thanks to the structures we have defined, we write chunks of generated triples to the output file. Additionally, it is in our plan to support the creation of KGs using previous versions, as you mention, but right now it is neither supported.

Thanks for using the SDM-RDFizer, hope this answers your questions.

David

mevs commented 3 years ago

Dear @tangyong,

Many thanks for your interest in our work! The members of the team are working constantly in adding more features to the SDM-RDFizer. This public version of the SDM-RDFizer does not support the incremental creation of the knowledge graph. Nevertheless, the structures and strategies implemented in SDM-RDFizer enable to include new incoming data into an RDF knowledge graph incrementally. We have a beta version of the SDM-RDFizer that implements these features, and we could give you access in case you are interested. It is essential to highlight that we have applied these incremental SDM-RDFizer in the context of evolving data, e.g., new scholarly data, instead than in the context of IoT. Extending the SDM-RDFizer for IoT data is part of our future plans. Thus, if you have a specific task or use case, please, contact me directly to my personal account maria.vidal@tib.eu.

Best regards, Maria-Esther Vidal

tangyong commented 3 years ago

Dear @tangyong, At this moment we do not support streaming construction of the KGs, the incremental generation of the KG means that we do not maintain all the triples generated in memory thanks to the structures we have defined, we write chunks of generated triples to the output file. Additionally, it is in our plan to support the creation of KGs using previous versions, as you mention, but right now it is neither supported.

Thanks for using the SDM-RDFizer, hope this answers your questions.

David

Thanks reply very much, I see. Another question: if I use SDM-RDFizer to build KG from existing data sources and wish to implement the KG update, then, could you give me some suggestion from your experience?

Thanks!

tangyong commented 3 years ago

Dear @tangyong,

Many thanks for your interest in our work! The members of the team are working constantly in adding more features to the SDM-RDFizer. This public version of the SDM-RDFizer does not support the incremental creation of the knowledge graph. Nevertheless, the structures and strategies implemented in SDM-RDFizer enable to include new incoming data into an RDF knowledge graph incrementally. We have a beta version of the SDM-RDFizer that implements these features, and we could give you access in case you are interested. It is essential to highlight that we have applied these incremental SDM-RDFizer in the context of evolving data, e.g., new scholarly data, instead than in the context of IoT. Extending the SDM-RDFizer for IoT data is part of our future plans. Thus, if you have a specific task or use case, please, contact me directly to my personal account maria.vidal@tib.eu.

Best regards, Maria-Esther Vidal

Great news and plans! And I am interested the beta version very much! and I have such case for smart city i.e. IoT case. I will send email to you.

Thanks @mevs again very much !

eiglesias34 commented 3 years ago

Thanks reply very much, I see. Another question: if I use SDM-RDFizer to build KG from existing data sources and wish to implement the KG update, then, could you give me some suggestion from your experience?

Dear @tangyong,

To update an existing KG, you need access to the KG in question (being it from an endpoint, database, file, etc.) so you can compare the new triples with the KG to determine if they do not already exist.

Best regards, Enrique Iglesias

tangyong commented 3 years ago

Thanks reply very much, I see. Another question: if I use SDM-RDFizer to build KG from existing data sources and wish to implement the KG update, then, could you give me some suggestion from your experience?

Dear @tangyong,

To update an existing KG, you need access to the KG in question (being it from an endpoint, database, file, etc.) so you can compare the new triples with the KG to determine if they do not already exist.

Best regards, Enrique Iglesias

Dear @eiglesias34 @mevs @dachafra

Thanks Enrique Iglesias's suggestions ! I have some comments as following:

First, I totally agree with you said: "compare the new triples with the KG to determine if they do not already exist"

Second, I want to say some details, on a real world case e.g. IoT inter-connect sensor data in a streaming way,

  1. I have built a knowledge graph using history data sources related such an IoT scene, and we can build it offline because it will take a relative long time, and I ignore RDF store(in reality, I will use gStore) for now.

  2. Sensor data in a streaming way is continuously coming by a lot of devices and entering e.g. kafka, then, I wish to update the built knowledge graph for reflecting the newest data change in real time. Since I plan to update in real time, data volume should be in a reasonable acceptable range by many factors. Then, I will use spark/flink to split data according to time window. Then, according to your suggestion, I will have four ways in mind to base the splitted data to update the KG as following:

(1) According the same RML mapping file, I use a different RML tool (not SDM-RDFizer) for making new triples based on the splitted data and other properties(eg. manally trigering insert/delete operations...), then, I compare the new triples with the KG and determine if I will insert/insert them.

(2) According the same RML mapping file, I still use SDM-RDFizer to make a new KG(.nt file) for the splitted data, then, I read the new .nt file and resovle the new tripes and compare the new triples with the KG and determine if I will insert/insert them.

(3) Based on (2), SDM-RDFizer exposed an interface to obtain the new tripes to avoid myself resovling the new triples.

(4) Based on (2), the whole update logic is offered by SDM-RDFizer, and I can image the following context,

from rdfizer.semantify import semantify import sys

KG = semantify(str(sys.argv[1])) ---- the built knowledge graph, KG is returned as context object, and we add transaction idea.

newSensorData = streamingWindowData(...) --- use spark/flink to split data according to time window newCsvfiedSensorData = formatCsv(newSensorData, ...) --- convert newSensorData into csv format

KG.update( newCsvfiedSensorData ) --- expose a new interface called "update" ...

From my view, I wish SDM-RDFizer to expose more operations/interfaces e.g. (3) and (4).

Thanks !

eiglesias34 commented 3 years ago

Dear @tangyong,

Thank you again for your interest in the SDM-RDFizer.

Given the complexity of your most recent question, the head of our group wishes to have met so that we can discuss it further. Please contact Prof. Maria-Esther Vidal at maria.vidal@tib.eu.

Best regards, Enrique Iglesias

tangyong commented 3 years ago

Dear @tangyong,

Thank you again for your interest in the SDM-RDFizer.

Given the complexity of your most recent question, the head of our group wishes to have met so that we can discuss it further. Please contact Prof. Maria-Esther Vidal at maria.vidal@tib.eu.

Best regards, Enrique Iglesias

OK, I will ping Prof. Maria-Esther Vidal.

Thanks!

LangDaoAI commented 1 year ago

Any update about the issue?

Thanks!