Closed fong22e closed 6 years ago
Why not make each library its own discrete node?
"l1" : {
"name" : "base",
"version" : "3.4"
},
"l2" : {
"name" : "devtools",
"version" : "1.2.3"
}
Above is an example that I came up with for the json output from provR. However, there are still a few issues with it.
Issues aside, the basic structure is as follows:
prefix
is where we put links to definitions. For example, what is 'prov' and 'rdt'.activity
is where procedure nodes goentity
is where data nodes, as well as the environment
and libraries
node gowasInformedBy
is where the procedure-to-procedure edges gowasGeneratedBy
is where the procedure-to-data edges go.used
is where the data-to-procedure edges goThe things I removed from the json from RDataTracker are information related to functions called and ValType information. If you would like me to put them back, please let me know!
The issues are as follows:
Firstly, the libraries. I was thinking about grouping them all under a libraries
node in entity
, and there would be 2 different ways this could be represented, both prov-json compliant. @tfjmp , I could do it that way if you would prefer, though would each library be a separate node in entity
? If so, should I group them using a collection?
The second issue is the way the edges are represented. Currently, it's not prov-json-compliant. It's unfortunately an issue that I just found out about. Unfortunately, the values can't be simply be, for example:
"e10" : {
"prov:activity" : "p6",
"prov:entity" : "d3"
}
The schema dictates that the values p6
and d3
must be in uri format (even though the example just names the node name). What would the uri be?
The things I removed from the json from RDataTracker are information related to functions called and ValType information. If you would like me to put them back, please let me know!
Keep ValType I think. What is the "information related to functions called"?
Firstly, the libraries. I was thinking about grouping them all under a libraries node in entity, and there would be 2 different ways this could be represented, both prov-json compliant. @tfjmp , I could do it that way if you would prefer, though would each library be a separate node in entity? If so, should I group them using a collection?
Yes please separate them. We can think a bit more how to represent libraries dependency, so for now no collection I think.
The second issue is the way the edges are represented. Currently, it's not prov-json-compliant. It's unfortunately an issue that I just found out about. Unfortunately, the values can't be simply be, for example:
"e10" : { "prov:activity" : "p6", "prov:entity" : "d3" }
The schema dictates that the values p6 and d3 must be in uri format (even though the example just names the node name). What would the uri be?
Let it be for now.
Regarding keeping ValType: Are you sure? The current json representation of ValType is not prov-json compliant and it makes the entire thing very, very complicated and very, very ugly if we wish to have it prov-json compliant.
We can think later about how to do it Prov compliant, but keep it in.
About the "information related to functions called": Each procedure lists out which library functions (those not in the base package and not user-defined) it calls, and each library lists out which package it's from. The implementation is slightly complex, unfortunately.
That seemed good, but did not we say we wanted to represent this as edges in the graph? Connected to the discrete library node previously mentioned?
About the libraries: Ok! So the
entity
section will have the following in order: data nodes, environment, library nodes. Does that sound good? I can type up an example if that helps.
That sounds perfect to me :).
With regards to functionsCalled
:
{
"activity" : {
// procedure nodes
"p1" : {
// the usual things for a procedure node
},
"p2" : {
// the usual things for a procedure node
}
},
"entity" : {
// data nodes
// environment
// library nodes
"l1" : {
"name" : "stringi",
"version" : "1.1.5"
},
"l2" : {
"name" : "stringr",
"version" : "1.2.0"
},
// function nodes
"f1" : {
"name" : "stri_sort"
},
"f2" : {
"name" : "stri_trim"
},
"f3" : {
"name" : "str_to_lower"
},
// collections for functions
"fc1" : {
"prov:type" : {
"$" : "prov:Collection",
"type": "xsd:QName"
}
},
"fc2" : {
"prov:type" : {
"$" : "prov:Collection",
"type": "xsd:QName"
}
}
},
// bi-directional edges
"hadMember" : {
"m1.0" : {
"prov:collection" : "fc1",
"prov:entity" : "l1"
},
"m1.1" : {
"prov:collection" : "fc1",
"prov:entity" : "f1"
},
"m1.2" : {
"prov:collection" : "fc1",
"prov:entity" : "f2"
},
"m2.0" : {
"prov:collection" : "fc2",
"prov:entity" : "l2"
},
"m2.1" : {
"prov:collection" : "fc2",
"prov:entity" : "f3"
}
},
// data to procedure edges (entity to activity)
"used" : {
// other data-to-procedure edges
// function called edges
"fe1" : {
"prov:entity" : "f1",
"prov:activity" : "p1"
},
"fe2" : {
"prov:entity" : "f2",
"prov:activity" : "p1"
},
"fe3" : {
"prov:entity" : "f3",
"prov:activity" : "p1"
},
"fe4" : {
"prov:entity" : "f2",
"prov:activity" : "p2"
}
}
}
Unfortunately, it seems that every reference to any node has to be in uri format, so that's an issue.
Is a collection of function a library?
Umm, the collection groups the functions that are from the same library together with the library. So in the above example, fc1
groups together l1
, the library node for the stringi
library, as well as the functions f1
and f2
, which are the function nodes for stri_sort
and stri_trim
. It's the only way Barbara and I could think of to link 2 or more entity nodes together and have it prov-json compliant. It's very messy.
Can you do:
"fc2" : {
"prov:type" : {
"$" : "prov:Collection",
"type": "xsd:QName"
}
}
becomes:
"l2" : {
"name":"devtools",
"version":"1.2.3",
"prov:type" : {
"$" : "prov:Collection",
"type": "xsd:QName"
}
}
Yes, that's prov-json compliant! That's genius! Thanks so much, Thomas!
I'll collect all this into 1 example json file and I'll upload that as soon as I'm done!
Here it is! The only things that are not prov-json compliant are valType
and the fact that all node references have to be in uri format.
{
"prefix" : {
"prov" : "http://www.w3.org/ns/prov#",
"rdt" : "http://rdatatracker.org/"
},
"activity" : {
// procedure nodes
"p1" : {
"rdt:name" : "FunctionOriginTest.R",
"rdt:type" : "Start",
"rdt:elapsedTime" : "0.450000000000017",
"rdt:scriptNum" : "NA",
"rdt:startLine" : "NA",
"rdt:startCol" : "NA",
"rdt:endLine" : "NA",
"rdt:endCol" : "NA"
},
"p2" : {
"rdt:name" : "stopifnot(as.numeric(stri_flatten(stri_sort(stri_trim(stri_r",
"rdt:type" : "Operation",
"rdt:elapsedTime" : "0.629999999999995",
"rdt:scriptNum" : "0",
"rdt:startLine" : "36",
"rdt:startCol" : "1",
"rdt:endLine" : "42",
"rdt:endCol" : "13"
}
},
"entity" : {
// data nodes
"d1" : {
"rdt:name" : "a",
"rdt:value" : "1",
"rdt:valType" : {"container":"vector", "dimension":[1], "type":["numeric"]},
"rdt:type" : "Data",
"rdt:scope" : "R_GlobalEnv",
"rdt:fromEnv" : "FALSE",
"rdt:MD5hash" : "",
"rdt:timestamp" : "",
"rdt:location" : ""
},
"d2" : {
"rdt:name" : "b",
"rdt:value" : "2",
"rdt:valType" : {"container":"vector", "dimension":[1], "type":["numeric"]},
"rdt:type" : "Data",
"rdt:scope" : "R_GlobalEnv",
"rdt:fromEnv" : "FALSE",
"rdt:MD5hash" : "",
"rdt:timestamp" : "",
"rdt:location" : ""
},
// environment
"environment" : {
"rdt:name" : "environment",
"rdt:architecture" : "x86_64" ,
"rdt:operatingSystem" : "windows" ,
"rdt:language" : "R" ,
"rdt:rVersion" : "R version 3.3.3 (2017-03-06)" ,
"rdt:script" : "C:/Users/fong22e/Documents/HarvardForest/RDataTracker_functionOrigin/ForTesting/src/FunctionOriginTest.R" ,
"rdt:sourcedScripts" : "" ,
"rdt:scriptTimeStamp" : "2017-09-14T23.39.48EDT" ,
"rdt:workingDirectory" : "C:/Users/fong22e/Documents/HarvardForest/RDataTracker_functionOrigin/ForTesting/src" ,
"rdt:ddgDirectory" : "./FunctionOriginTest_ddg" ,
"rdt:ddgTimeStamp" : "2017-09-14T23.39.58EDT" ,
"rdt:rdatatrackerVersion" : "2.27.0"
},
// library nodes: are collections
"l1" : {
"name" : "stringi",
"version" : "1.1.5",
"prov:type" : {
"$" : "prov:Collection",
"type": "xsd:QName"
}
},
"l2" : {
"name" : "stringr",
"version" : "1.2.0",
"prov:type" : {
"$" : "prov:Collection",
"type": "xsd:QName"
}
},
// function nodes
"f1" : {
"name" : "stri_sort"
},
"f2" : {
"name" : "stri_trim"
},
"f3" : {
"name" : "str_to_lower"
}
},
"wasInformedBy" : {
// procedure-to-procedure edges
"e1" : {
"prov:informant" : "p1",
"prov:informed" : "p2"
},
"e3" : {
"prov:informant" : "p2",
"prov:informed" : "p3"
}
},
"wasGeneratedBy" : {
// procedure-to-data edges
"e2" : {
"prov:entity" : "d1",
"prov:activity" : "p2"
},
"e4" : {
"prov:entity" : "d2",
"prov:activity" : "p3"
}
},
"used" : {
// data-to-procedure edges
"e10" : {
"prov:activity" : "p6",
"prov:entity" : "d3"
},
"e13" : {
"prov:activity" : "p7",
"prov:entity" : "d3"
},
// function-to-procedure edges
"e15" : {
"prov:entity" : "f1",
"prov:activity" : "p1"
},
"e16" : {
"prov:entity" : "f2",
"prov:activity" : "p1"
},
"e17" : {
"prov:entity" : "f3",
"prov:activity" : "p1"
},
"e18" : {
"prov:entity" : "f2",
"prov:activity" : "p2"
}
},
"hadMember" : {
// group functions from the same library together
"m1.1" : {
"prov:collection" : "l1",
"prov:entity" : "f1"
},
"m1.2" : {
"prov:collection" : "l1",
"prov:entity" : "f2"
},
"m2.1" : {
"prov:collection" : "l2",
"prov:entity" : "f3"
}
}
}
Good work :)
\o/ Thank you so much!!! And thank you so much for all your help!!
I'll go look into using jsonlite
to print that out now.