json example - Githubissues

fong22e commented 6 years ago

{
    "prefix" : {
        "prov" : "http://www.w3.org/ns/prov#",
        "rdt" : "http://rdatatracker.org/"
    },

    "activity" : {

        "p1" : {
            "rdt:name" : "FunctionOriginTest.R",
            "rdt:type" : "Start",
            "rdt:elapsedTime" : "0.450000000000017",
            "rdt:scriptNum" : "NA",
            "rdt:startLine" : "NA",
            "rdt:startCol" : "NA",
            "rdt:endLine" : "NA",
            "rdt:endCol" : "NA"
        },

        "p2" : {
            "rdt:name" : "stopifnot(as.numeric(stri_flatten(stri_sort(stri_trim(stri_r",
            "rdt:type" : "Operation",
            "rdt:elapsedTime" : "0.629999999999995",
            "rdt:scriptNum" : "0",
            "rdt:startLine" : "36",
            "rdt:startCol" : "1",
            "rdt:endLine" : "42",
            "rdt:endCol" : "13"
        }
    },

    "entity" : {

        "d1" : {
            "rdt:name" : "a",
            "rdt:value" : "1",
            "rdt:type" : "Data",
            "rdt:scope" : "R_GlobalEnv",
            "rdt:fromEnv" : "FALSE",
            "rdt:MD5hash" : "",
            "rdt:timestamp" : "",
            "rdt:location" : ""
        },

        "d2" : {
            "rdt:name" : "b",
            "rdt:value" : "2",
            "rdt:type" : "Data",
            "rdt:scope" : "R_GlobalEnv",
            "rdt:fromEnv" : "FALSE",
            "rdt:MD5hash" : "",
            "rdt:timestamp" : "",
            "rdt:location" : ""
        },

        "environment" : {
            "rdt:name" : "environment",
            "rdt:architecture" : "x86_64" ,
            "rdt:operatingSystem" : "windows" ,
            "rdt:language" : "R" ,
            "rdt:rVersion" : "R version 3.3.3 (2017-03-06)" ,
            "rdt:script" : "C:/Users/fong22e/Documents/HarvardForest/RDataTracker_functionOrigin/ForTesting/src/FunctionOriginTest.R" ,
            "rdt:sourcedScripts" : "" ,
            "rdt:scriptTimeStamp" : "2017-09-14T23.39.48EDT" ,
            "rdt:workingDirectory" : "C:/Users/fong22e/Documents/HarvardForest/RDataTracker_functionOrigin/ForTesting/src" ,
            "rdt:ddgDirectory" : "./FunctionOriginTest_ddg" ,
            "rdt:ddgTimeStamp" : "2017-09-14T23.39.58EDT" ,
            "rdt:rdatatrackerVersion" : "2.27.0"
        },

        "libraries" : {

            "pkg1" : ["base","3.3.3"],
            "pkg2" : ["utils","3.3.3"],

            "pkg3" : [
                {
                    "$" : "base",
                    "type" : "xsd:QName"
                },
                {
                    "$" : "3.3.3",
                    "type" : "rdt:version"
                }
            ],
            "pkg4" : [
                {
                    "$" : "devtools",
                    "type" : "xsd:QName"
                },
                {
                    "$" : "1.13.2",
                    "type" : "rdt:version"
                }
            ]
        }
    },

    "wasInformedBy" : {

        "e1" : {
            "prov:informant" : "p1",
            "prov:informed" : "p2"
        },

        "e3" : {
            "prov:informant" : "p2",
            "prov:informed" : "p3"
        }
    },

    "wasGeneratedBy" : {

        "e2" : {
            "prov:entity" : "d1",
            "prov:activity" : "p2"
        },

        "e4" : {
            "prov:entity" : "d2",
            "prov:activity" : "p3"
        }
    },

    "used" : {

        "e10" : {
            "prov:activity" : "p6",
            "prov:entity" : "d3"
        },

        "e13" : {
            "prov:activity" : "p7",
            "prov:entity" : "d3"
        }
    }
}

tfjmp commented 6 years ago

Why not make each library its own discrete node?

"l1" : {
"name" : "base",
"version" : "3.4"
},
"l2" : {
"name" : "devtools",
"version" : "1.2.3"
}

fong22e commented 6 years ago

Above is an example that I came up with for the json output from provR. However, there are still a few issues with it.

Issues aside, the basic structure is as follows:

prefix is where we put links to definitions. For example, what is 'prov' and 'rdt'.
activity is where procedure nodes go
entity is where data nodes, as well as the environment and libraries node go
wasInformedBy is where the procedure-to-procedure edges go
wasGeneratedBy is where the procedure-to-data edges go.
used is where the data-to-procedure edges go

The things I removed from the json from RDataTracker are information related to functions called and ValType information. If you would like me to put them back, please let me know!

The issues are as follows:

Firstly, the libraries. I was thinking about grouping them all under a libraries node in entity, and there would be 2 different ways this could be represented, both prov-json compliant. @tfjmp , I could do it that way if you would prefer, though would each library be a separate node in entity? If so, should I group them using a collection?

The second issue is the way the edges are represented. Currently, it's not prov-json-compliant. It's unfortunately an issue that I just found out about. Unfortunately, the values can't be simply be, for example:

"e10" : {
    "prov:activity" : "p6",
    "prov:entity" : "d3"
}

The schema dictates that the values p6 and d3 must be in uri format (even though the example just names the node name). What would the uri be?

tfjmp commented 6 years ago

The things I removed from the json from RDataTracker are information related to functions called and ValType information. If you would like me to put them back, please let me know!

Keep ValType I think. What is the "information related to functions called"?

Firstly, the libraries. I was thinking about grouping them all under a libraries node in entity, and there would be 2 different ways this could be represented, both prov-json compliant. @tfjmp , I could do it that way if you would prefer, though would each library be a separate node in entity? If so, should I group them using a collection?

Yes please separate them. We can think a bit more how to represent libraries dependency, so for now no collection I think.

The second issue is the way the edges are represented. Currently, it's not prov-json-compliant. It's unfortunately an issue that I just found out about. Unfortunately, the values can't be simply be, for example:
"e10" : {
"prov:activity" : "p6",
"prov:entity" : "d3"
}
The schema dictates that the values p6 and d3 must be in uri format (even though the example just names the node name). What would the uri be?

Let it be for now.

fong22e commented 6 years ago

Regarding keeping ValType: Are you sure? The current json representation of ValType is not prov-json compliant and it makes the entire thing very, very complicated and very, very ugly if we wish to have it prov-json compliant.

We can think later about how to do it Prov compliant, but keep it in.

About the "information related to functions called": Each procedure lists out which library functions (those not in the base package and not user-defined) it calls, and each library lists out which package it's from. The implementation is slightly complex, unfortunately.

That seemed good, but did not we say we wanted to represent this as edges in the graph? Connected to the discrete library node previously mentioned?

About the libraries: Ok! So the entity section will have the following in order: data nodes, environment, library nodes. Does that sound good? I can type up an example if that helps.

That sounds perfect to me :).

fong22e commented 6 years ago

With regards to functionsCalled:

{
    "activity" : {
        // procedure nodes
        "p1" : {
            // the usual things for a procedure node
        },
        "p2" : {
            // the usual things for a procedure node
        }
    },

    "entity" : {
        // data nodes

        // environment

        // library nodes
        "l1" : {
            "name" : "stringi",
            "version" : "1.1.5"
        },
        "l2" : {
            "name" : "stringr",
            "version" : "1.2.0"
        },

        // function nodes
        "f1" : {
            "name" : "stri_sort"
        },
        "f2" : {
            "name" : "stri_trim"
        },
        "f3" : {
            "name" : "str_to_lower"
        },

        // collections for functions
        "fc1" : {
            "prov:type" : {
                "$" : "prov:Collection",
                "type": "xsd:QName"
            }
        },
        "fc2" : {
            "prov:type" : {
                "$" : "prov:Collection",
                "type": "xsd:QName"
            }
        }
    },

    // bi-directional edges
    "hadMember" : {
        "m1.0" : {
            "prov:collection" : "fc1",
            "prov:entity" : "l1"
        },
        "m1.1" : {
            "prov:collection" : "fc1",
            "prov:entity" : "f1"
        },
        "m1.2" : {
            "prov:collection" : "fc1",
            "prov:entity" : "f2"
        },
        "m2.0" : {
            "prov:collection" : "fc2",
            "prov:entity" : "l2"
        },
        "m2.1" : {
            "prov:collection" : "fc2",
            "prov:entity" : "f3"
        }
    },

    // data to procedure edges (entity to activity)
    "used" : {
        // other data-to-procedure edges

        // function called edges
        "fe1" : {
            "prov:entity" : "f1",
            "prov:activity" : "p1"
        },
        "fe2" : {
            "prov:entity" : "f2",
            "prov:activity" : "p1"
        },
        "fe3" : {
            "prov:entity" : "f3",
            "prov:activity" : "p1"
        },
        "fe4" : {
            "prov:entity" : "f2",
            "prov:activity" : "p2"
        }
    }
}

Unfortunately, it seems that every reference to any node has to be in uri format, so that's an issue.

tfjmp commented 6 years ago

Is a collection of function a library?

fong22e commented 6 years ago

Umm, the collection groups the functions that are from the same library together with the library. So in the above example, fc1 groups together l1, the library node for the stringi library, as well as the functions f1 and f2, which are the function nodes for stri_sort and stri_trim. It's the only way Barbara and I could think of to link 2 or more entity nodes together and have it prov-json compliant. It's very messy.

tfjmp commented 6 years ago

Can you do:

"fc2" : {
            "prov:type" : {
                "$" : "prov:Collection",
                "type": "xsd:QName"
            }
        }

becomes:

"l2" : {
"name":"devtools",
"version":"1.2.3",
            "prov:type" : {
                "$" : "prov:Collection",
                "type": "xsd:QName"
            }
        }

fong22e commented 6 years ago

Yes, that's prov-json compliant! That's genius! Thanks so much, Thomas!

I'll collect all this into 1 example json file and I'll upload that as soon as I'm done!

fong22e commented 6 years ago

Here it is! The only things that are not prov-json compliant are valType and the fact that all node references have to be in uri format.

{
    "prefix" : {
        "prov" : "http://www.w3.org/ns/prov#",
        "rdt" : "http://rdatatracker.org/"
    },

    "activity" : {
        // procedure nodes
        "p1" : {
            "rdt:name" : "FunctionOriginTest.R",
            "rdt:type" : "Start",
            "rdt:elapsedTime" : "0.450000000000017",
            "rdt:scriptNum" : "NA",
            "rdt:startLine" : "NA",
            "rdt:startCol" : "NA",
            "rdt:endLine" : "NA",
            "rdt:endCol" : "NA"
        },
        "p2" : {
            "rdt:name" : "stopifnot(as.numeric(stri_flatten(stri_sort(stri_trim(stri_r",
            "rdt:type" : "Operation",
            "rdt:elapsedTime" : "0.629999999999995",
            "rdt:scriptNum" : "0",
            "rdt:startLine" : "36",
            "rdt:startCol" : "1",
            "rdt:endLine" : "42",
            "rdt:endCol" : "13"
        }
    },

    "entity" : {
        // data nodes
        "d1" : {
            "rdt:name" : "a",
            "rdt:value" : "1",
            "rdt:valType" : {"container":"vector", "dimension":[1], "type":["numeric"]},
            "rdt:type" : "Data",
            "rdt:scope" : "R_GlobalEnv",
            "rdt:fromEnv" : "FALSE",
            "rdt:MD5hash" : "",
            "rdt:timestamp" : "",
            "rdt:location" : ""
        },
        "d2" : {
            "rdt:name" : "b",
            "rdt:value" : "2",
            "rdt:valType" : {"container":"vector", "dimension":[1], "type":["numeric"]},
            "rdt:type" : "Data",
            "rdt:scope" : "R_GlobalEnv",
            "rdt:fromEnv" : "FALSE",
            "rdt:MD5hash" : "",
            "rdt:timestamp" : "",
            "rdt:location" : ""
        },

        // environment
        "environment" : {
            "rdt:name" : "environment",
            "rdt:architecture" : "x86_64" ,
            "rdt:operatingSystem" : "windows" ,
            "rdt:language" : "R" ,
            "rdt:rVersion" : "R version 3.3.3 (2017-03-06)" ,
            "rdt:script" : "C:/Users/fong22e/Documents/HarvardForest/RDataTracker_functionOrigin/ForTesting/src/FunctionOriginTest.R" ,
            "rdt:sourcedScripts" : "" ,
            "rdt:scriptTimeStamp" : "2017-09-14T23.39.48EDT" ,
            "rdt:workingDirectory" : "C:/Users/fong22e/Documents/HarvardForest/RDataTracker_functionOrigin/ForTesting/src" ,
            "rdt:ddgDirectory" : "./FunctionOriginTest_ddg" ,
            "rdt:ddgTimeStamp" : "2017-09-14T23.39.58EDT" ,
            "rdt:rdatatrackerVersion" : "2.27.0"
        },

        // library nodes: are collections
        "l1" : {
            "name" : "stringi",
            "version" : "1.1.5",
            "prov:type" : {
                "$" : "prov:Collection",
                "type": "xsd:QName"
            }
        },
        "l2" : {
            "name" : "stringr",
            "version" : "1.2.0",
            "prov:type" : {
                "$" : "prov:Collection",
                "type": "xsd:QName"
            }
        },

        // function nodes
        "f1" : {
            "name" : "stri_sort"
        },
        "f2" : {
            "name" : "stri_trim"
        },
        "f3" : {
            "name" : "str_to_lower"
        }
    },

    "wasInformedBy" : {
        // procedure-to-procedure edges
        "e1" : {
            "prov:informant" : "p1",
            "prov:informed" : "p2"
        },
        "e3" : {
            "prov:informant" : "p2",
            "prov:informed" : "p3"
        }
    },

    "wasGeneratedBy" : {
        // procedure-to-data edges
        "e2" : {
            "prov:entity" : "d1",
            "prov:activity" : "p2"
        },
        "e4" : {
            "prov:entity" : "d2",
            "prov:activity" : "p3"
        }
    },

    "used" : {
        // data-to-procedure edges
        "e10" : {
            "prov:activity" : "p6",
            "prov:entity" : "d3"
        },
        "e13" : {
            "prov:activity" : "p7",
            "prov:entity" : "d3"
        },

        // function-to-procedure edges
        "e15" : {
            "prov:entity" : "f1",
            "prov:activity" : "p1"
        },
        "e16" : {
            "prov:entity" : "f2",
            "prov:activity" : "p1"
        },
        "e17" : {
            "prov:entity" : "f3",
            "prov:activity" : "p1"
        },
        "e18" : {
            "prov:entity" : "f2",
            "prov:activity" : "p2"
        }
    },

    "hadMember" : {
        // group functions from the same library together
        "m1.1" : {
            "prov:collection" : "l1",
            "prov:entity" : "f1"
        },
        "m1.2" : {
            "prov:collection" : "l1",
            "prov:entity" : "f2"
        },

        "m2.1" : {
            "prov:collection" : "l2",
            "prov:entity" : "f3"
        }
    }
}

tfjmp commented 6 years ago

Good work :)

fong22e commented 6 years ago

\o/ Thank you so much!!! And thank you so much for all your help!!

I'll go look into using jsonlite to print that out now.

ProvTools / provR

json example #70