DataConservancy / dcs-package-ingest

Ingests the contents of Data Conservancy Packages into a Fedora 4 repository.
1 stars 2 forks source link

Error ingesting objects with blank nodes containing `bag://` uris #5

Open emetsger opened 8 years ago

emetsger commented 8 years ago

The package ingest service is unable to deposit objects that have blank nodes containing bag:// uris. For example, given an object in a package (elided for brevity):

<>      a       <osf:NodeBase> , <osf:Registration> , <osf:OSFBusinessObject> , <osf:DataEntity> , <osf:Node> , <rdfs:Resource> ;
        <osf:hasContributor>
                [ a       <osf:OSFBusinessObject> , <osf:Contributor> , <rdfs:Resource> ;
                  <osf:hasPermission>
                          "ADMIN" ;
                  <osf:hasUser>
                          <bag://MyPackage/data/obj/root/bwgcm.ttl> ;
                  <osf:isBibliographic>
                          true
                ] ;
        <osf:hasContributor>
                [ a       <osf:OSFBusinessObject> , <osf:Contributor> , <rdfs:Resource> ;
                  <osf:hasPermission>
                          "ADMIN" ;
                  <osf:hasUser>
                          <bag://MyPackage/data/obj/root/qmdz6.ttl> ;
                  <osf:isBibliographic>
                          false
                ] ;

For example, the object of the [] <osf:hasUser> <bag://MyPackage/data/obj/root/qmdz6.ttl> triple will not be able to be updated from the bag:// URI to the Fedora repository URI. This is due to an intersection of Fedora behaviors and the algorithm used by the package ingest service to re-write bag:// URIs to the ultimate resource URI of deposited objects.

Initial deposit of the object succeeds. Upon deposit, Fedora skolemizes the blank nodes. The secondary operation to update the bag:// URIs fails because of the single subject restriction of Fedora, and the structure of the update:

INSERT {
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/.well-known/genid/27/21/5c/f3/27215cf3-48c7-46a4-a9f5-670b06b2a8ef> <http://www.dataconservancy.org/osf-business-object-model#hasUser> <http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/bwgcm> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/.well-known/genid/aa/1a/f4/d1/aa1af4d1-cc6e-4a76-b971-f91b8b0bfc63> <http://www.dataconservancy.org/osf-business-object-model#hasUser> <http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/qmdz6> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#registeredBy> <http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/qmdz6> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#hasChild> <http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/vae86> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#registeredFrom> <http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/3e7rd> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#hasProvider> <http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4_osfstorage> .
}
WHERE{};
DELETE {
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/.well-known/genid/27/21/5c/f3/27215cf3-48c7-46a4-a9f5-670b06b2a8ef> <http://www.dataconservancy.org/osf-business-object-model#hasUser> <bag://MyPackage/data/obj/root/bwgcm.ttl> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/.well-known/genid/aa/1a/f4/d1/aa1af4d1-cc6e-4a76-b971-f91b8b0bfc63> <http://www.dataconservancy.org/osf-business-object-model#hasUser> <bag://MyPackage/data/obj/root/qmdz6.ttl> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#hasChild> <bag://MyPackage/data/obj/root/vae86.ttl> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#hasProvider> <bag://MyPackage/data/obj/root/eq7a4_osfstorage.ttl> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#registeredBy> <bag://MyPackage/data/obj/root/qmdz6.ttl> .
<http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4> <http://www.dataconservancy.org/osf-business-object-model#registeredFrom> <bag://MyPackage/data/obj/root/3e7rd.ttl> .
}
WHERE{}

This manifests as a 403 being returned by Fedora when the update is attempted:

403 Forbidden 
http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/.well-known/genid/27/21/5c/f3/27215cf3-48c7-46a4-a9f5-670b06b2a8ef is not in the topic of this RDF, which is http://localhost:8080/fcrepo/rest/tx:b95d9774-739e-4f35-a587-a7b4640a2f8e/ManualWiredDepositIT.depositFlatPackageTest/eq7a4 
emetsger commented 8 years ago

It isn't fully clear to me if the package ingest service could be modified to properly update skolemized blank nodes, or if Fedora even allows modification of blank nodes.

A work-around is to model the data using hash URIs instead of anonymous nodes.

birkland commented 8 years ago

I think Fedora does allow updates, the client just needs to know to seek them out individually to update them. Since we know that the only updates we need are bag URI mappings, one way around this would be to update the algorithm used for re-mapping objects.

Currently, after all objects have been deposited we visit every ingested object, scan them for the presence of bag URIs, then use the map of bag URIs to resource URIs to replace them.

With blank nodes, for each ingested object, we'd need to determine if the object of any triples are skolem resources. Then we'd need to iterate through each skolem resource and apply the same mapping. If the skolem resources have any triples whole object is another skoem resource, we'd need to visit those, etc.