Closed jkesanie closed 7 years ago
based on:
{
"provenance" : {
"context": {
"workflowID": 1,
"activityID": 1,
"stepID": 1
},
"agent": {
"ID": "agentID",
"role": "agentRole"
},
"activity": {
"title": "activity title",
"type": "Step",
"description": "description",
"status": "SUCCESS",
"startTime": "2017-08-02T13:52:29+02:00",
"endTime": "2017-08-02T13:52:29+02:00",
"communication": [
{
"agent": "GMAPI",
"role": "transformer",
"input": {
"frame": {
"role": "Configuration"
},
"inputGraphs": {
"role": "inputGraphs"
}
}
}
]
},
"input": {
"inputGraphs": {
"role": "inputGraphs"
}
},
"output": {
"outputGraphs": {
"role": "outputGraphs"
}
}
},
"payload": {
"inputGraphs": "attx:dataset1",
"outputGraphs": "http://platform/public/rest/documents"
}
}
we can achieve this:
@prefix attx: <http://data.hulib.helsinki.fi/attx/> .
@prefix attxonto: <http://data.hulib.helsinki.fi/attx/onto#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix pwo: <http://purl.org/spar/pwo/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
attx:workflow1_activity1_step1_agentID a attxonto:Step,
prov:Activity ;
dcterms:description "description" ;
dcterms:title "activity title" ;
prov:endedAtTime "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
prov:generated attx:workflow1_activity1_outputGraphs ;
prov:qualifiedAssociation attx:association_22449d0925a94eb3748635026af06b00 ;
prov:qualifiedCommunication [ a prov:Communication ;
prov:activity attx:workflow1_activity1_step1_GMAPI ;
prov:hadRole attx:transformer ] ;
prov:qualifiedGeneration attx:generated_64cbb356efdd59d797591ffb342fbb7e ;
prov:qualifiedUsage attx:used_ae325c51b1523371e8026f19b86e3e7e ;
prov:startedAtTime "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
prov:used attx:workflow1_activity1_inputGraphs ;
prov:wasAssociatedWith attx:agentID .
attx:GMAPI a attxonto:Artifact,
prov:Agent .
attx:agentRole a prov:Role .
attx:association_22449d0925a94eb3748635026af06b00 a prov:Association ;
prov:agent attx:agentID ;
prov:hadRole attx:agentRole .
attx:generated_64cbb356efdd59d797591ffb342fbb7e a prov:Generation ;
prov:entity attx:workflow1_activity1_outputGraphs ;
prov:hadRole attx:outputGraphs .
attx:inputGraphs a prov:Role .
attx:outputGraphs a prov:Role .
attx:transformer a prov:Role .
attx:used_ae325c51b1523371e8026f19b86e3e7e a prov:Usage ;
prov:entity attx:workflow1_activity1_inputGraphs ;
prov:hadRole attx:inputGraphs .
attx:workflow1_activity1_Configuration a prov:Role .
attx:workflow1_activity1_step1_GMAPI a prov:Activity ;
prov:qualifiedUsage [ a prov:Usage ;
prov:entity attx:workflow1_activity1_step1_GMAPI_b3fe647f5c3a4032f8fe18c8efe9c9a6 ;
prov:hadRole attx:workflow1_activity1_inputGraphs ],
[ a prov:Usage ;
prov:entity attx:workflow1_activity1_step1_GMAPI_dcf3e36ee8115282aad46485cab6a4be ;
prov:hadRole attx:workflow1_activity1_Configuration ] ;
prov:used attx:workflow1_activity1_step1_GMAPI_b3fe647f5c3a4032f8fe18c8efe9c9a6,
attx:workflow1_activity1_step1_GMAPI_dcf3e36ee8115282aad46485cab6a4be ;
prov:wasAssociatedWith attx:GMAPI .
attx:agentID a attxonto:Artifact,
prov:Agent .
attx:workflow1_activity1_outputGraphs a prov:Entity ;
dcterms:source "http://platform/public/rest/documents" .
attx:workflow1_activity1_inputGraphs a prov:Entity,
prov:Role ;
dcterms:source "attx:dataset1" .
New type of step that sends in the workflow level prov:used and prov:generated data.
Step 0
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": 1,
"step": "describeExternalDS"
},
"agent": {
"ID": "UV",
"role": "ETL"
},
"activity": {
"title": "Ingestion workflow",
"type": "DescribeStepExecution",
"startTime": "2017-08-02T13:52:29+02:00",
"endTime": "2017-08-02T13:52:29+02:00"
},
"output": {
"outputDataset": {
"role": "Dataset"
}
}
},
"payload": {
"outputDataset": {
"uri": "attx://ds/1",
"title": "Harvested dataset",
"description": "",
"publisher": "UH",
"license": "http://cc/0"
}
}
}
Example messages: NOTE: Communications still need some more information.
workflow-started-prov Step 1a
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": "1"
},
"agent": {
"ID": "UV",
"role": "ETL"
},
"activity": {
"title": "Ingestion workflow",
"type": "WorkflowExecution",
"startTime": "2017-08-02T13:52:29+02:00"
},
"input": {
"inputDataset": {
"role": "Dataset"
}
},
"output": {
"outputDataset": {
"role": "Dataset"
}
}
},
"payload": {
"inputDataset": "http://dataset/1",
"outputDataset": "http://dataset/2"
}
}
harvestData-prov Step 1b
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": "1",
"stepID": "harvestData"
},
"agent": {
"ID": "UV",
"role": "ETL"
},
"activity": {
"title": "Harvest data",
"type": "StepExecution",
"startTime": "2017-08-02T13:52:29+02:00",
"endTime": "2017-08-02T13:52:29+02:00",
"status": "SUCCESS"
},
"input": {
"harvestConfiguration": {
"role": "StepConfiguration"
}
},
"output": {
"harvestedContent": {
"role": "DatasetContent"
}
}
},
"payload": {
"harvestConfiguration": {
"apiURL": "http://data/api",
"endpoint": "/stocks",
"query": "*"
}
}
}
transformToRDF-prov Step 2
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": 1,
"stepID": "tranformToRDF"
},
"agent": {
"ID": "UV",
"role": "ETL"
},
"activity": {
"title": "Transform to RDF",
"type": "StepExecution",
"startTime": "2017-08-02T13:52:29+02:00",
"endTime": "2017-08-02T13:52:29+02:00",
"status": "SUCCESS",
"communication": [
{
"agent": "RMLService",
"role": "transformer",
"input": {}
}
]
},
"input": {
"harvestedContent": {}
},
"output": {
"transformerData": {
"role": "tempDataset"
}
}
},
"payload": {}
}
replaceDataset-prov - This is missing input and output. Step 3
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": 1,
"stepID": "replaceds"
},
"agent": {
"ID": "UV",
"role": "ETL"
},
"activity": {
"title": "Replace content of the existing dataset",
"type": "StepExecution",
"startTime": "2017-08-02T13:52:29+02:00",
"endTime": "2017-08-02T13:52:29+02:00",
"status": "SUCCESS",
"communication": [
{
"agent": "GMAPI" ,
"role": "storage",
"input": {}
}
]
},
"input": {
"transformerData": {
"role": "tempDataset"
}
},
"output": {
"outputDataset": {
"role": "Dataset"
}
}
},
"payload": {
"transformerData": "attx:tempDataset",
"outputDataset": "http://dataset/2"
}
}
Workflow ended message. Maybe the input dataset should be part of the startWorkflow message and the output as part of this. Now they are both included in both messages. Step 3b
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": 1
},
"agent": {
"ID": "UV",
"role": "ETL"
},
"activity": {
"title": "Ingestion workflow",
"type": "WorkflowExecution",
"endTime": "2017-08-02T13:52:29+02:00"
},
"input": {
"inputDataset": {
"role": "Dataset"
}
},
"output": {
"outputDataset": {
"role": "Dataset"
}
}
},
"payload": {
"inputDataset": "http://dataset/1",
"outputDataset": "http://dataset/2"
}
}
What is the content of service replies or PROV messages send by the services if they are prov aware?
@jkesanie not sure i follow. Also please use these types of activities (notice the sentence case): "StepExecution", "DescribeStepExecution", "ServiceExecution", "WorkflowExecution" - otherwise it will not work :) - we do not support that feature yet.
@jkesanie responses from Services (numbered the steps above - please review). Step 3 - REPLY
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": 1,
"stepID": "replaceds"
},
"agent": {
"ID": "GMAPI",
"role": "storage"
},
"activity": {
"title": "Store data in the graph",
"type": "ServiceExecution",
"startTime": "2017-08-02T13:52:29+02:00",
"endTime": "2017-08-02T13:52:29+02:00",
"status": "SUCCESS"
},
"input": {
"inputGraphs": {
"role": "inputGraphs"
}
}
},
"payload": {
"inputGraphs": "attx:dataset1"
}
}
Step 2 - REPLY
{
"provenance": {
"context": {
"workflowID": "ingestionwf",
"activityID": 1,
"stepID": "tranformToRDF"
},
"agent": {
"ID": "RMLService",
"role": "transformer"
},
"activity": {
"title": "Transform Data",
"type": "ServiceExecution",
"startTime": "2017-08-02T13:52:29+02:00",
"endTime": "2017-08-02T13:52:29+02:00",
"status": "SUCCESS"
},
"input": {
"inputGraphs": {
"role": "inputGraphs"
}
},
"output": {
"transformerData": {
"role": "tempDataset"
}
}
},
"payload": {
"inputGraphs": "attx:dataset1",
"transformerData": "attx:tempDataset"
}
}
This approach seems to generate blank nodes with the same content e.g.:
attx:workflowingestionwf_activity1_UV
a prov:Activity , attxonto:WorkflowExecution ;
dcterms:title "Ingestion workflow" ;
prov:endedAtTime "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
prov:generated attx:workflowingestionwf_activity1_outputDataset ;
prov:qualifiedAssociation [ a prov:Association ;
prov:agent attx:UV ;
prov:hadPlan attx:workflowingestionwf_activity1 ;
prov:hadRole attx:ETL
] ;
prov:qualifiedAssociation [ a prov:Association ;
prov:agent attx:UV ;
prov:hadPlan attx:workflowingestionwf_activity1 ;
prov:hadRole attx:ETL
] ;
prov:qualifiedGeneration [ a prov:Generation ;
prov:entity attx:workflowingestionwf_activity1_outputDataset ;
prov:hadRole attx:Dataset
] ;
prov:qualifiedGeneration [ a prov:Generation ;
prov:entity attx:workflowingestionwf_activity1_outputDataset ;
prov:hadRole attx:Dataset
] ;
prov:qualifiedGeneration [ a prov:Generation ;
prov:entity attx:workflowingestionwf_activity1_outputDataset ;
prov:hadRole attx:Dataset
] ;
prov:qualifiedUsage [ a prov:Usage ;
prov:entity attx:workflowingestionwf_activity1_inputDataset ;
prov:hadRole attx:Dataset
] ;
prov:qualifiedUsage [ a prov:Usage ;
prov:entity attx:workflowingestionwf_activity1_inputDataset ;
prov:hadRole attx:Dataset
] ;
prov:startedAtTime "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
prov:used attx:workflowingestionwf_activity1_inputDataset ;
prov:wasAssociatedWith attx:UV .
While the used and generated RDF triples can be minimised by reducing the redundancy in sent information from messages (send input and output only once per activity - either the request or in the reply), the qualifiedAssociation
seems unavoidable unless we introduce IDs for Associations - which seem an unnecessary overhead besides the possibility of clustering WorkflowExecution
type of Activities
under specific Associations
.
We also discussed of the possibility of an agent having multiple Role
e.g. GMAPI for storage and manager etc.
Description
Transform minimum JSON provenance document into ATTX provenance RDF. Requires handling of requests and replies.
DoD
Provenance service is able to output provenance RDF.
Testing