ATTX-project / provenance-service

ATTX Provenance service for exposing provenance related information.
1 stars 1 forks source link

Implement processing of minimum provenance message #6

Closed jkesanie closed 7 years ago

jkesanie commented 7 years ago

Description

Transform minimum JSON provenance document into ATTX provenance RDF. Requires handling of requests and replies.

DoD

Provenance service is able to output provenance RDF.

Testing

blankdots commented 7 years ago

based on:

{
  "provenance" : {
    "context": {
      "workflowID": 1,
      "activityID": 1,
      "stepID": 1
    },
    "agent": {
      "ID": "agentID",
      "role": "agentRole"      
    },
    "activity": {
      "title": "activity title",      
      "type": "Step",
      "description": "description",
      "status": "SUCCESS",
      "startTime": "2017-08-02T13:52:29+02:00",
      "endTime": "2017-08-02T13:52:29+02:00",
      "communication": [
        {
                    "agent": "GMAPI",
          "role": "transformer",
          "input": {
            "frame": {
              "role": "Configuration"
            },
            "inputGraphs": {
              "role": "inputGraphs"
            }
          }
        }
      ]
    },
    "input": {
      "inputGraphs": {
        "role": "inputGraphs"
      }    
    },
    "output": {
      "outputGraphs": {
        "role": "outputGraphs"
      }
    }
  },  
  "payload": {
    "inputGraphs": "attx:dataset1",    
    "outputGraphs": "http://platform/public/rest/documents"

  }
}

we can achieve this:

@prefix attx: <http://data.hulib.helsinki.fi/attx/> .
@prefix attxonto: <http://data.hulib.helsinki.fi/attx/onto#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix pwo: <http://purl.org/spar/pwo/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix schema: <http://schema.org/> .
@prefix sd: <http://www.w3.org/ns/sparql-service-description#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

attx:workflow1_activity1_step1_agentID a attxonto:Step,
        prov:Activity ;
    dcterms:description "description" ;
    dcterms:title "activity title" ;
    prov:endedAtTime "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
    prov:generated attx:workflow1_activity1_outputGraphs ;
    prov:qualifiedAssociation attx:association_22449d0925a94eb3748635026af06b00 ;
    prov:qualifiedCommunication [ a prov:Communication ;
            prov:activity attx:workflow1_activity1_step1_GMAPI ;
            prov:hadRole attx:transformer ] ;
    prov:qualifiedGeneration attx:generated_64cbb356efdd59d797591ffb342fbb7e ;
    prov:qualifiedUsage attx:used_ae325c51b1523371e8026f19b86e3e7e ;
    prov:startedAtTime "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
    prov:used attx:workflow1_activity1_inputGraphs ;
    prov:wasAssociatedWith attx:agentID .

attx:GMAPI a attxonto:Artifact,
        prov:Agent .

attx:agentRole a prov:Role .

attx:association_22449d0925a94eb3748635026af06b00 a prov:Association ;
    prov:agent attx:agentID ;
    prov:hadRole attx:agentRole .

attx:generated_64cbb356efdd59d797591ffb342fbb7e a prov:Generation ;
    prov:entity attx:workflow1_activity1_outputGraphs ;
    prov:hadRole attx:outputGraphs .

attx:inputGraphs a prov:Role .

attx:outputGraphs a prov:Role .

attx:transformer a prov:Role .

attx:used_ae325c51b1523371e8026f19b86e3e7e a prov:Usage ;
    prov:entity attx:workflow1_activity1_inputGraphs ;
    prov:hadRole attx:inputGraphs .

attx:workflow1_activity1_Configuration a prov:Role .

attx:workflow1_activity1_step1_GMAPI a prov:Activity ;
    prov:qualifiedUsage [ a prov:Usage ;
            prov:entity attx:workflow1_activity1_step1_GMAPI_b3fe647f5c3a4032f8fe18c8efe9c9a6 ;
            prov:hadRole attx:workflow1_activity1_inputGraphs ],
        [ a prov:Usage ;
            prov:entity attx:workflow1_activity1_step1_GMAPI_dcf3e36ee8115282aad46485cab6a4be ;
            prov:hadRole attx:workflow1_activity1_Configuration ] ;
    prov:used attx:workflow1_activity1_step1_GMAPI_b3fe647f5c3a4032f8fe18c8efe9c9a6,
        attx:workflow1_activity1_step1_GMAPI_dcf3e36ee8115282aad46485cab6a4be ;
    prov:wasAssociatedWith attx:GMAPI .

attx:agentID a attxonto:Artifact,
        prov:Agent .

attx:workflow1_activity1_outputGraphs a prov:Entity ;
    dcterms:source "http://platform/public/rest/documents" .

attx:workflow1_activity1_inputGraphs a prov:Entity,
        prov:Role ;
    dcterms:source "attx:dataset1" .
jkesanie commented 7 years ago

New type of step that sends in the workflow level prov:used and prov:generated data.

Step 0

{
    "provenance": {
        "context": {
          "workflowID": "ingestionwf",
          "activityID": 1,
          "step": "describeExternalDS"

        },
        "agent": {
          "ID": "UV",
          "role": "ETL"      
        },  
        "activity": {
            "title": "Ingestion workflow",
            "type": "DescribeStepExecution",
            "startTime": "2017-08-02T13:52:29+02:00",
            "endTime": "2017-08-02T13:52:29+02:00"
        },
        "output": {
              "outputDataset": {
                "role": "Dataset"
              }
        }        
    },
    "payload": {
        "outputDataset": {
            "uri": "attx://ds/1",
            "title": "Harvested dataset",
            "description": "",
            "publisher": "UH",
            "license": "http://cc/0"            
        }        
    }
}
jkesanie commented 7 years ago

Example messages: NOTE: Communications still need some more information.

workflow-started-prov Step 1a

{
    "provenance": {
        "context": {
          "workflowID": "ingestionwf",
          "activityID": "1"

        },
        "agent": {
          "ID": "UV",
          "role": "ETL"      
        },  
        "activity": {
            "title": "Ingestion workflow",
            "type": "WorkflowExecution",
            "startTime": "2017-08-02T13:52:29+02:00"
        },
        "input": {
              "inputDataset": {
                "role": "Dataset"                
              }
        },              
        "output": {
              "outputDataset": {
                "role": "Dataset"
              }
        }        
    },
    "payload": {
        "inputDataset": "http://dataset/1",
        "outputDataset": "http://dataset/2"
    }
}

harvestData-prov Step 1b

{
    "provenance": {
        "context": {
          "workflowID": "ingestionwf",
          "activityID": "1",
          "stepID": "harvestData"
        },
        "agent": {
          "ID": "UV",
          "role": "ETL"      
        },  
        "activity": {
            "title": "Harvest data",
            "type": "StepExecution",
            "startTime": "2017-08-02T13:52:29+02:00",
            "endTime": "2017-08-02T13:52:29+02:00",
            "status": "SUCCESS"
        },
        "input": {
              "harvestConfiguration": {
                "role": "StepConfiguration"
              }
        },              
        "output": {
              "harvestedContent": {
                "role": "DatasetContent"
              }
        }        
    },
    "payload": {
        "harvestConfiguration": {
            "apiURL": "http://data/api",
            "endpoint": "/stocks",
            "query": "*"            
        }        
    }
}

transformToRDF-prov Step 2

{
    "provenance": {
        "context": {
          "workflowID": "ingestionwf",
          "activityID": 1,
          "stepID": "tranformToRDF"
        },
        "agent": {
          "ID": "UV",
          "role": "ETL"      
        },  
        "activity": {
            "title": "Transform to RDF",
            "type": "StepExecution",
            "startTime": "2017-08-02T13:52:29+02:00",
            "endTime": "2017-08-02T13:52:29+02:00",
            "status": "SUCCESS",
            "communication": [ 
                {
                    "agent": "RMLService",
            "role": "transformer",
            "input": {}
                }
            ]
        },
        "input": {
            "harvestedContent": {}
        },
        "output": {
              "transformerData": {
                "role": "tempDataset"
              }
        }                      
    },
    "payload": {}
}

replaceDataset-prov - This is missing input and output. Step 3

{
    "provenance": {
        "context": {
          "workflowID": "ingestionwf",
          "activityID": 1,
          "stepID": "replaceds"
        },
        "agent": {
          "ID": "UV",
          "role": "ETL"      
        },  
        "activity": {
            "title": "Replace content of the existing dataset",
            "type": "StepExecution",
            "startTime": "2017-08-02T13:52:29+02:00",
            "endTime": "2017-08-02T13:52:29+02:00",
            "status": "SUCCESS",
            "communication": [
                {
                    "agent": "GMAPI" ,
                    "role": "storage",
            "input": {}
                }
            ]
        },
        "input": {
              "transformerData": {
                "role": "tempDataset"
              }
        },
        "output": {
            "outputDataset": {
                "role": "Dataset"
            }
        }    
    },
   "payload": {
       "transformerData": "attx:tempDataset",
       "outputDataset": "http://dataset/2"
   }
}
jkesanie commented 7 years ago

Workflow ended message. Maybe the input dataset should be part of the startWorkflow message and the output as part of this. Now they are both included in both messages. Step 3b

{
    "provenance": {
        "context": {
          "workflowID": "ingestionwf",
          "activityID": 1

        },
        "agent": {
          "ID": "UV",
          "role": "ETL"      
        },  
        "activity": {
            "title": "Ingestion workflow",
            "type": "WorkflowExecution",
            "endTime": "2017-08-02T13:52:29+02:00"
        },
        "input": {
              "inputDataset": {
                "role": "Dataset"
              }
        },              
        "output": {
              "outputDataset": {
                "role": "Dataset"
              }
        }        
    },
    "payload": {
        "inputDataset": "http://dataset/1",
        "outputDataset": "http://dataset/2"
    }
}
jkesanie commented 7 years ago

What is the content of service replies or PROV messages send by the services if they are prov aware?

blankdots commented 7 years ago

@jkesanie not sure i follow. Also please use these types of activities (notice the sentence case): "StepExecution", "DescribeStepExecution", "ServiceExecution", "WorkflowExecution" - otherwise it will not work :) - we do not support that feature yet.

blankdots commented 7 years ago

@jkesanie responses from Services (numbered the steps above - please review). Step 3 - REPLY

{
    "provenance": {
        "context": {
            "workflowID": "ingestionwf",
            "activityID": 1,
            "stepID": "replaceds"
        },
        "agent": {
            "ID": "GMAPI",
            "role": "storage"
        },
        "activity": {
            "title": "Store data in the graph",
            "type": "ServiceExecution",
            "startTime": "2017-08-02T13:52:29+02:00",
            "endTime": "2017-08-02T13:52:29+02:00",
            "status": "SUCCESS"
        },
        "input": {
            "inputGraphs": {
                "role": "inputGraphs"
            }
        }
    },
    "payload": {
        "inputGraphs": "attx:dataset1"
    }
}

Step 2 - REPLY

{
    "provenance": {
        "context": {
            "workflowID": "ingestionwf",
            "activityID": 1,
            "stepID": "tranformToRDF"
        },
        "agent": {
            "ID": "RMLService",
            "role": "transformer"
        },
        "activity": {
            "title": "Transform Data",
            "type": "ServiceExecution",
            "startTime": "2017-08-02T13:52:29+02:00",
            "endTime": "2017-08-02T13:52:29+02:00",
            "status": "SUCCESS"
        },
        "input": {
            "inputGraphs": {
                "role": "inputGraphs"
            }
        },
        "output": {
              "transformerData": {
                "role": "tempDataset"
              }
        }
    },
    "payload": {
        "inputGraphs": "attx:dataset1",
        "transformerData": "attx:tempDataset"
    }
}
blankdots commented 7 years ago

This approach seems to generate blank nodes with the same content e.g.:

attx:workflowingestionwf_activity1_UV
        a                          prov:Activity , attxonto:WorkflowExecution ;
        dcterms:title              "Ingestion workflow" ;
        prov:endedAtTime           "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
        prov:generated             attx:workflowingestionwf_activity1_outputDataset ;
        prov:qualifiedAssociation  [ a             prov:Association ;
                                     prov:agent    attx:UV ;
                                     prov:hadPlan  attx:workflowingestionwf_activity1 ;
                                     prov:hadRole  attx:ETL
                                   ] ;
        prov:qualifiedAssociation  [ a             prov:Association ;
                                     prov:agent    attx:UV ;
                                     prov:hadPlan  attx:workflowingestionwf_activity1 ;
                                     prov:hadRole  attx:ETL
                                   ] ;
        prov:qualifiedGeneration   [ a             prov:Generation ;
                                     prov:entity   attx:workflowingestionwf_activity1_outputDataset ;
                                     prov:hadRole  attx:Dataset
                                   ] ;
        prov:qualifiedGeneration   [ a             prov:Generation ;
                                     prov:entity   attx:workflowingestionwf_activity1_outputDataset ;
                                     prov:hadRole  attx:Dataset
                                   ] ;
        prov:qualifiedGeneration   [ a             prov:Generation ;
                                     prov:entity   attx:workflowingestionwf_activity1_outputDataset ;
                                     prov:hadRole  attx:Dataset
                                   ] ;
        prov:qualifiedUsage        [ a             prov:Usage ;
                                     prov:entity   attx:workflowingestionwf_activity1_inputDataset ;
                                     prov:hadRole  attx:Dataset
                                   ] ;
        prov:qualifiedUsage        [ a             prov:Usage ;
                                     prov:entity   attx:workflowingestionwf_activity1_inputDataset ;
                                     prov:hadRole  attx:Dataset
                                   ] ;
        prov:startedAtTime         "2017-08-02T13:52:29+02:00"^^xsd:dateTime ;
        prov:used                  attx:workflowingestionwf_activity1_inputDataset ;
        prov:wasAssociatedWith     attx:UV .

While the used and generated RDF triples can be minimised by reducing the redundancy in sent information from messages (send input and output only once per activity - either the request or in the reply), the qualifiedAssociation seems unavoidable unless we introduce IDs for Associations - which seem an unnecessary overhead besides the possibility of clustering WorkflowExecution type of Activities under specific Associations.

We also discussed of the possibility of an agent having multiple Role e.g. GMAPI for storage and manager etc.