fair-workflows / nanopub

Python client for searching, publishing and modifying nanopublications.
https://fair-workflows.github.io/nanopub
Apache License 2.0
22 stars 7 forks source link

Adding / changing provenance_rdf information doesn't do any changes #139

Closed evaevertovna closed 3 years ago

evaevertovna commented 3 years ago

I ran the example-snippet of code under sub-header "Specifying custom provenance triples" found in this link: https://nanopub.readthedocs.io/en/latest/publishing/setting-subgraphs.html and displayed below:

import rdflib
from nanopub import namespaces, Publication

my_assertion = rdflib.Graph()
my_assertion.add((rdflib.term.BNode('timbernserslee'), rdflib.RDF.type, rdflib.FOAF.Person))

provenance_rdf = rdflib.Graph()
provenance_rdf = provenance_rdf.add((rdflib.term.BNode('timbernserslee'),
                                     namespaces.PROV.actedOnBehalfOf,
                                     rdflib.term.BNode('markzuckerberg')))
publication = Publication.from_assertion(assertion_rdf=my_assertion,
                                         provenance_rdf=provenance_rdf)
print(publication)

I assume the provenance_rdf is supposed to add the actedOnBehalfOf PROV namespace to the provenance triple but when I run that snippet it remains the same... Is this an issue or am I missing something ?

The output I get:

Original source URI = None
@prefix : <http://purl.org/nanopub/temp/mynanopub#> .
@prefix np: <http://www.nanopub.org/nschema#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:pubInfo {
    : prov:generatedAtTime "2021-07-12T13:55:09.997303"^^xsd:dateTime ;
        prov:wasAttributedTo <https://orcid.org/0000-0002-8953-0490> .
}

:assertion {
    :timbernserslee a <http://xmlns.com/foaf/0.1/Person> .
}

:Head {
    : a np:Nanopublication ;
        np:hasAssertion :assertion ;
        np:hasProvenance :provenance ;
        np:hasPublicationInfo :pubInfo .
}

:provenance {
    :assertion prov:generatedAtTime "2021-07-12T13:55:09.997303"^^xsd:dateTime .
}

As you can tell, the provenance has only information about when the assertion was generated, but not about the 'acted on behalf' namespace. Or is this represented in the :pubInfo {? But, shouldn't this be under :provenance { ?

svenvanderburg commented 3 years ago

@evaevertovna thanks for reporting this! Indeed the triple you added should end up in the provenance section of the nanopublication. Me or @raar1 can look into it as soon as we have the time (but our time is currently very limited). If you feel comfortable you can try to fix it yourself. See the test on this line: https://github.com/fair-workflows/nanopub/blob/450ff6642a7e86ba8315aa396a5c06cc4a3edd55/tests/test_publication.py#L65 that then probably needs an update!

evaevertovna commented 3 years ago

Thanks for your response. I have been having a look at the code but it is a bit tricky to debug since this type of 'bug' does not show an error message. Could you please elaborate when saying that the function in line 65 "needs an update"? I assume the test_publication.py tests whether the Publication class works properly? Do you maybe have an idea at which line in the Publication class script should I be looking at to explore why the provenance does not update ? Thanks in advance!

svenvanderburg commented 3 years ago

@evaevertovna So this test: https://github.com/fair-workflows/nanopub/blob/450ff6642a7e86ba8315aa396a5c06cc4a3edd55/tests/test_publication.py#L65 presumably tests whether if we pass an RDF graph tot the provenance_rdf argument that the triples in that graph end up in the publication.provenance graph. This is very similar to what you test in your code snippet above, but apparently that fails. So the test doesn't actually test what it should test (because apparently the functionality is broken).

You should checkout this method: https://github.com/fair-workflows/nanopub/blob/450ff6642a7e86ba8315aa396a5c06cc4a3edd55/nanopub/publication.py#L133 In line 201-215 the provenance rdf is added (but presumably incorrectly).

Let us know if you feel comfortable picking this up, we can also assist you in that (we could have a call to look at it together).

svenvanderburg commented 3 years ago

@evaevertovna Let us know if you will work on it, otherwise we will pick it up ourselves this week, in our small maintenance sprint.

evaevertovna commented 3 years ago

@svenvanderburg I am not sure if I will be able to provide a solution soon enough, so your contribution is also appreciated. I see that the if-statement in line 212 of publication.py does not seem to identify that the provenance is "not None", the following line:

https://github.com/fair-workflows/nanopub/blob/450ff6642a7e86ba8315aa396a5c06cc4a3edd55/nanopub/publication.py#L212

So it might have to do with the keyword argument or the provenance graph is failing to bind with the main graph? I'm just thinking along... cheers!

svenvanderburg commented 3 years ago

@evaevertovna it turns out the problem is in the way you build up the provenance RDF:

provenance_rdf = rdflib.Graph()
provenance_rdf = provenance_rdf.add((rdflib.term.BNode('timbernserslee'),
                                     namespaces.PROV.actedOnBehalfOf,
                                     rdflib.term.BNode('markzuckerberg')))

You assign the output of provenance_rdf.add() to a new variable provenance_rdf. provenance_rdf.add() returns None, so your variable provenance_rdf will be None.

The proper syntax for adding to an rdflif graph is like this:

provenance_rdf = rdflib.Graph()
provenance_rdf.add((rdflib.term.BNode('timbernserslee'),
                                     namespaces.PROV.actedOnBehalfOf,
                                     rdflib.term.BNode('markzuckerberg')))

I also missed it initially ;) I guess you just need to get used to the rdflib API (which unfortunately is not super-well documented IMO).

Anyway, please keep reporting issues if you encounter any broken functionality!

This closes the issue as the functionality is working as expected.