duraspace / lambdora

Fedora Repository API implemented with AWS Lambda, API Gateway and DynamoDB
Apache License 2.0
9 stars 3 forks source link

Creating Hyku/Hyrax AdminSets fails from the UI only #59

Open tdonohue opened 6 years ago

tdonohue commented 6 years ago

When using Lambdora with a Hyku front-end, creating the default AdminSet (via its rake task, hyrax:default_admin_set:create) succeeds, and results in the following object in Lambdora:

https://627cys77vl.execute-api.us-east-1.amazonaws.com/dev/dev/ad/mi/n_/se/admin_set/default

However, when manually creating a new AdminSet from the Hyku UI, we encounter a Model mismatch. Expected AdminSet. Got: ActiveFedora::Base error. On the backend the AdminSet is created, but its path structure is quite different from the Default AdminSet, e.g.

https://627cys77vl.execute-api.us-east-1.amazonaws.com/dev/dev/abab7f63-5a3e-44f0-8a65-01133ecc7057

After creation of the new AdminSet, Hyku makes later GET calls to retrieve that AdminSet via a "pairtree" like path: https://627cys77vl.execute-api.us-east-1.amazonaws.com/dev/dev/ab/ab/7f/63/abab7f63-5a3e-44f0-8a65-01133ecc7057

Strangely, both of these paths are created in Lambdora, but only the first path (/dev/abab7f63-5a3e-44f0-8a65-01133ecc7057) includes all necessary AdminSet triples, while the second path (/dev/ab/ab/7f/63/abab7f63-5a3e-44f0-8a65-01133ecc7057) is an "empty", unnamed container.

It seems that the Model mismatch error results from a GET request to the second path, as that error seems to only occur when the hasModel triple is missing from the response: https://github.com/samvera/active_fedora/blob/11-4-stable/lib/active_fedora/relation/finder_methods.rb#L210

We suspect that active_fedora is generating these pairtree-like paths for UUIDs (including the "admin_set/default" UUID of the Default AdminSet), but have not yet determined where that occurs and whether this is a bug in active_fedora or in Lambdora (or both).

tdonohue commented 6 years ago

NOTE: The AdminSet created from the UI is also not fully indexed in Solr. Compared to the Default AdminSet, the title seems to be missing. That seems to imply that perhaps the indexing in Solr logic uses a GET against the second path (empty, unnamed container).

tdonohue commented 6 years ago

After consultation on Slack with @awoods, it seems this pairtree path behavior is mimicking a Fedora Modeshape implementation behavior (which exists in the Modeshape implementation for performance reasons, but is not required for all Fedora implementations). That explains the reason why the Samvera codebase would be expecting/generating such pairtree paths. But, we still need to debug Samvera to determine why two resources are created (one without the pairtree path that includes all triples, and one with the pairtree path that is a mostly-empty resource).

tdonohue commented 6 years ago

Here's the exact line of code where Hyku (or Hyrax) turns IDs into pairtree paths: https://github.com/samvera/active_fedora-noid/blob/2.x-stable/lib/active_fedora/noid.rb#L19

For example, running this method with the default AdminSet ID returns the path we are seeing:

eval ActiveFedora::Noid.treeify("admin_set/default")
"ad/mi/n_/se/admin_set/default"

This ActiveFedora::Noid gem is enabled by default in Hyrax (and therefore also Hyku).

The call stack here looks like this:

tdonohue commented 6 years ago

I've figured out what seems to be going on:

  1. Samvera sends an initial POST of a new resource without any ID provided. It expects Fedora to provide the ID. This initial POST looks like this:
    HTTP Method=POST
    Path=/dev
    Resource=/{thepath+}
    Path Parameters={thepath=dev}
    Headers={Accept=*/*, Accept-Encoding=gzip;q=1.0,deflate;q=0.6,identity;q=0.3, Authorization=Basic ZmVkb3JhQWRtaW46ZmVkb3JhQWRtaW4=, CloudFront-Forwarded-Proto=https, CloudFront-Is-Desktop-Viewer=true, CloudFront-Is-Mobile-Viewer=false, CloudFront-Is-SmartTV-Viewer=false, CloudFront-Is-Tablet-Viewer=false, CloudFront-Viewer-Country=US, Content-Type=text/turtle, Host=627cys77vl.execute-api.us-east-1.amazonaws.com, User-Agent=Faraday v0.12.2, Via=1.1 fd885dc16612d4e9d70f328fd0542052.cloudfront.net (CloudFront), X-Amz-Cf-Id=LrpLMx8uF-q3wPh8wEmqck7SzlB62Y9XPJkvHHA9VETCSh1p_0FBWA==, X-Amzn-Trace-Id=Root=1-59fca11e-10ae3121686455176fda9121, X-Forwarded-For=34.230.187.85, 54.182.230.68, X-Forwarded-Port=443, X-Forwarded-Proto=https}
    Body=
    <> <http://purl.org/dc/terms/title> "Tim AdminSet";
    <http://purl.org/dc/elements/1.1/creator> "sysadmin@duraspace.org";
    <http://www.w3.org/ns/auth/acl#accessControl> <https://627cys77vl.execute-api.us-east-1.amazonaws.com/dev/dev/84/0f/66/8f/840f668f-b2dc-4d8f-b25f-8ff1d94fa113>;
    <info:fedora/fedora-system:def/model#hasModel> "AdminSet" .
  2. Lambdora generates a UUID for the resource, creates the resource under that UUID (without the "tree-ified" path, but with all the necessary Triples), and returns the UUID to Samvera.
  3. Samvera stores this UUID. But, then (likely assuming this is Fedora 4) Samvera sends all future requests to the "tree-ified" UUID path.
  4. The first of these requests is a POST to assign ACLs to the "tree-ified" PATH. This POST seems to trigger Lambdora to "findOrCreate()" the "tree-ified" PATH, resulting in a second resource that is essentially empty (under that "tree-ified" PATH). This second POST looks like this (notice the accessTo triple in the Body which references the resource created above under its "tree-ified" path):
    HTTP Method=POST
    Path=/dev/84/0f/66/8f/840f668f-b2dc-4d8f-b25f-8ff1d94fa113
    Resource=/{thepath+}
    Path Parameters={thepath=dev/84/0f/66/8f/840f668f-b2dc-4d8f-b25f-8ff1d94fa113}
    Headers={Accept=*/*, Accept-Encoding=gzip;q=1.0,deflate;q=0.6,identity;q=0.3, Authorization=Basic ZmVkb3JhQWRtaW46ZmVkb3JhQWRtaW4=, CloudFront-Forwarded-Proto=https, CloudFront-Is-Desktop-Viewer=true, CloudFront-Is-Mobile-Viewer=false, CloudFront-Is-SmartTV-Viewer=false, CloudFront-Is-Tablet-Viewer=false, CloudFront-Viewer-Country=US, Content-Type=text/turtle, Host=627cys77vl.execute-api.us-east-1.amazonaws.com, User-Agent=Faraday v0.12.2, Via=1.1 3fd5c92e1c5215f08f0dbd6059f21be4.cloudfront.net (CloudFront), X-Amz-Cf-Id=NGaM5xESm1SvCZbpPQ5z3dpUkuS6W4yXf0dSvmhDTB3FHsFVLsdQeQ==, X-Amzn-Trace-Id=Root=1-59fca11f-7aa994eb1cbbfecb4d330d75, X-Forwarded-For=34.230.187.85, 54.182.230.61, X-Forwarded-Port=443, X-Forwarded-Proto=https}
    Body=
    <> <http://www.w3.org/ns/auth/acl#accessTo> <https://627cys77vl.execute-api.us-east-1.amazonaws.com/dev/dev/ab/ab/7f/63/abab7f63-5a3e-44f0-8a65-01133ecc7057>;
    <http://www.w3.org/ns/auth/acl#agent> <http://projecthydra.org/ns/auth/group#admin>;
    <http://www.w3.org/ns/auth/acl#mode> <http://www.w3.org/ns/auth/acl#Write>;
    <info:fedora/fedora-system:def/model#hasModel> "Hydra::AccessControls::Permission" .

In summary, it seems that Hyku "expects" Lambdora to act like Fedora 4. When Lambdora generates a UUID for a resource, Hyku expects that resource to be created under a pairtree path (based on that UUID)

In the situation of the Default AdminSet, this works because Hyku provides the ID to Lambdora. This bug seems specific to the scenario where Lambdora generates the ID and passes it back to Hyku.

tdonohue commented 6 years ago

The "quick fix" for Lambdora would likely be to tweak the default newResourceName so that it is configurable: https://github.com/duraspace/lambdora/blob/master/lambdora-http-api/src/main/java/org/fcrepo/lambdora/ldp/LambdoraLdp.java#L282

For Hyku/Hyrax applications, it'd need to be a tree-ified UUID (e.g. ab/ab/7f/63/abab7f63-5a3e-44f0-8a65-01133ecc7057), while for other applications it could be just a UUID (e.g. abab7f63-5a3e-44f0-8a65-01133ecc7057).

For the tree-ified UUID, we might be able to borrow some code/concepts from Fedora 4. This looks to be how they acheive it: https://github.com/fcrepo4/fcrepo4/blob/4.7.4-RC/fcrepo-kernel-api/src/main/java/org/fcrepo/kernel/api/services/functions/HierarchicalIdentifierSupplier.java