Islandora / documentation

Contains islandora's documentation and main issue queue.
MIT License
104 stars 71 forks source link

'Meaningful REST API' #349

Closed dannylamb closed 7 years ago

dannylamb commented 8 years ago

This is an offshoot from https://github.com/Islandora-CLAW/CLAW/issues/341#issuecomment-242100676. Tagging it here so we know where the convesation broke.

It's time we had this talk. How will we best implement ORE? What type of resources will we expose? How ugly are we going to let Drupal 8 make it? :smile:

There's a lot going on.

dannylamb commented 8 years ago

Assuming in Drupal land the resource map's html representation is a view? And it's on the aggregates and resources we allow for site building.

And are we ok with the ?_format=json param? Will linked data clients get that instead of expecting to see Accept headers?

And are we approaching this API from the angle of 'here's a bunch of pre-made types' and that's it or is it possible to expose lower level constructs (aggregations, file types, etc...)?

ruebot commented 8 years ago

How will we best implement ORE?

I gotta ask, where is PCDM in this?

DiegoPino commented 8 years ago

@dannylamb ReM could be just a Node entity with entity reference fields, but if you want a view also. The aggregation itself, top-resource, is a fedora_resource type entity. The ?_format=format can me maintained for Drupal 8 compat (was the idea of Claw we proposed to our communitny, to use Drupal, not to covert Drupal) but we can also enable Content negotiation, not complex at all.

I did not get this last part:

And are we approaching this API from the angle of 'here's a bunch of pre-made types' and that's it or is it possible to expose lower level constructs (aggregations, file types, etc...)?

acoburn commented 8 years ago

@ruebot: as I understand the PCDM question (and you all know my thoughts on this), I think it comes down to this:

If PCDM is a modeling construct for interoperability (e.g. b/t Islandora and Hydra), what is the intended layer for that interoperability? Is Fedora the integration layer? Or is "public linked data" the integration layer? I.e. is the model that a single institution wants to run both Islandora and Hydra on the same Fedora and have them play nicely? or is the model that institution A runs Islandora and publishes linked data and that data is meaningful to institution B which runs Hydra?

I would note that the level of application integration that would have to happen to make the first option a reality is extremely high, while the level of application integration for the second option is almost trivial.

DiegoPino commented 8 years ago

@whikloj @dannylamb @ruebot @bryjbrown @acoburn worth reading, i really can't follow all those specs but there is some reasoning in the drupal choices on negotiation. https://www.drupal.org/node/2364011

@dannylamb all negotation happens here https://api.drupal.org/api/drupal/core!lib!Drupal!Core!StackMiddleware!NegotiationMiddleware.php/class/NegotiationMiddleware/8.2.x

So it's a matter of creating an alternative service that works as we wish. by the way, content negotation and caching seem to be problematic here? I don't know what i'm talking about!

whikloj commented 8 years ago

This is low level stuff. This is not stopping PCDM. This is allowing other use cases that are not only PCDM. We are building at ORE because members of our community have expressed a use case for that. But this is MVP stuff, not the finished product

dannylamb commented 8 years ago

@DiegoPino Last I read, it came down to symfony-isms and cache behavior with the weird conneg. If we can adapt the middleware to make it work, all the better.

Saw this where they tried to make the stock Stack middleware for conneg work instead and it failed.

DiegoPino commented 8 years ago

@dannylamb @whikloj any one thoughts on this? Seems this symfony bundle could help here https://github.com/FriendsOfSymfony/FOSRestBundle

dannylamb commented 8 years ago

@DiegoPino i guess my main concern was if it would mess with something like marmotta. i guess there's only one way to find out if it'll work with conneg as a query param. gotta do it instead of being 'concerned'

dannylamb commented 8 years ago

@DiegoPino I dunno... I guess we have to fall on our Drupal sword here. One way or another, it's getting implemented in Drupal. We could use something else as a facade to smooth things over, if we feel that strongly about it. But that doesn't change the fact that we're still using Drupal. So let's see how it would work that way and decide from there if we can live with it.

In other words, lets see just how ugly it will be with straight d8. No PUTs and all.

DiegoPino commented 8 years ago

@dannylamb yeah, my idea was more drupalish. Include that bundle in composer.json of our module, and implement a new class based on NegotiationMiddleware that uses some of those goodies. But yes, it can be an overhaul. @whikloj is taking the content negotiation bullet doing some test and i thank him for that.

dannylamb commented 8 years ago

So i assume we'l have to expose rdf mappings? We get that from D8 core. I need to investigate their format in json.

Being able to generate context files from the mappings would be useful.

DiegoPino commented 8 years ago

@dannylamb i already did that. Will publish when ready.. slow today

dannylamb commented 8 years ago

@DiegoPino the context generation? rly?

dannylamb commented 8 years ago

Here's what I'm working under: Each Resource Map MUST be available from a different URI (ReM-1 and ReM-2) but either SHOULD be accessible via the Aggregation URI (A-1).

So I'm thinking something like this should satisfy it:

http://base_url/islandora/{id}

http://base_url/islandora/{id}?_format=jsonld

http://base_url/islandora/{id}#resourcemap

http://base_url/islandora/{id}#resourcemap?_format=jsonld

this means that resources, aggregates, and resource maps can be their own entities, where the resource map is just a node with every entity in the graph embedded.

dannylamb commented 8 years ago

also, pathauto anyone? i think would that make the rdf more meaningful if the uris were descriptive.

acoburn commented 8 years ago

@dannylamb w/r/t marmotta, I wouldn't worry too much -- if the default ldpclient LinkedDataProvder can't interpret the Islandora endpoint, it's really easy to write code for a custom endpoint

acoburn commented 8 years ago

@DiegoPino there are some issues with the above suggestion, but the main one is that anything following a # sign in the url is part of the url fragment, which means {path}#resourcemap?_format=jsonld refers to the resource with the hash fragment of #resourcemap?_format=jsonld and not the fragment #resourcemap and query param _format=jsonld.

dannylamb commented 8 years ago

@acoburn I screwed that up. Looked it up and the query parameters are supposed to come before the fragment. If we were to swap their orders, would that still be acceptable?

Also we don't have to do fragments, just trying to satisfy the SHOULD condition of the spec.

acoburn commented 8 years ago

@dannylamb oh yeah, swapping those would be great and would address my concern 100%

DiegoPino commented 8 years ago

@acoburn still reading @dannylamb idea. I have to digest this, but i understand your concern.

bryjbrown commented 8 years ago

http://base_url/islandora/{id}#resourcemap

  • aggregates will be able to redirect to the html representation of their resource map through this fragment

This seems to be the inverse of what I saw the ORE Primer recommend, which is that you add a fragment to the end of a ReM URI to point to an Aggregation, not the other way around.

Aggregation: A-1 = http://example.org/foo.rdf#aggregation Resource Map: ReM-1 = http://example.org/foo.rdf

If I'm not misunderstanding the spec or this thread, then we could use http://base_url/islandora/{id} to resolve to an HTML representation of a ReM (since there may be many ReMs for a single Aggregation), but add #aggregation to the end of any given ReM URI to redirect to the Aggregation the ReM is describing.

acoburn commented 8 years ago

This is definitely getting into implementation details here, but one reason why you might not want the aggregation to be a hash URI is that if the aggregation is being stored in Fedora, you'd want a clean mapping from the public (Drupal-based) URL to the internal Fedora URL. And if the Fedora identifier for the aggregation is a hashURI, then you won't get all of the LDP membership / containment goodness that Fedora provides -- since that applies only to ldp:Containers and hash URIs are not ldp:Containers (i.e. they are more limited).

My reading of the spec suggests that as long as the URLs are different for ReMs and Aggregations (and all of the suggestions so far include that), then we're OK, but as I said, we should avoid using a hashURI for the aggregation -- unless you want to introduce some more complex logic for mapping that URL to the Fedora resource.

DiegoPino commented 8 years ago

@dannylamb about @context in JSON-LD serialization in Drupal 8:

i’m working on a little problem i found in the json-ld @context creation: basically i need a naming strategy for our keys consistently, since i would like to cache this (means canonical, always same order?), avoid duplicates, or avoid redefining @context if keys duplicate in nested scenarios.

resuming:

A) assure a semi constant @context for same type of entities but mostly, B) avoid the problem with duplicated keys, like for example “value", which is a common key everywhere.

so question is: what would be a good naming strategy for json-ld context in your opinion? with that I can create a "shortName" method or something similar (based on machine name..?=

I also have to extract the xsd:which is already in the API buy the way, that is not ready yet...

DiegoPino commented 8 years ago

@acoburn @bryjbrown 👍 on "avoid using hashURI"

dannylamb commented 8 years ago

sold. no hash uris.

dannylamb commented 8 years ago

@DiegoPino Will there not just be a context per mapping? Do mappings have names? Is there a scenario where we would need more than one context per rdf mapping?

dannylamb commented 8 years ago

So.... basically what @acoburn outlined roughly here then? :smile:

dannylamb commented 8 years ago

Ok, so we need to take a stab at making outlining this API.

In Drupal, we know we'll need Rdf and NonRdf resources. From @DiegoPino's foray into media entities, it looks lke we may want to have an fcr:metadata entity as well, to work with the Media entity/NonRdfResource.

If sync goes full 1:1, that also means we'll need to consider proxies and in/direct containers, too.

Then there's the turnkey object types we have to crank out for images, pages, and books.

Is there anything else we should be exposing?

dannylamb commented 8 years ago

And I think by the way we'll be working in Drupal, we'll get the resource map for free. So I'm for adding read-only support for that. I'm not thrilled since I have no real use case for it, but we are making ore:Aggregates (or things that refine them). So I see no harm in rolling with it.

DiegoPino commented 8 years ago

@dannylamb on the drupal side, fcr:metadata are just the fields mapped to rdf in the fedora_resource type, non_rdf_source bundle and/or those attached to the media entity (tech medatada)

Are we using the included Drupal REST API right to create resources coming from fedora4? If so, we need to mostly handle:

And into API design we should we taken in account WebAC to drupal authz ways. Since users are also entities and do have UUID, should we be mapping that also?

dannylamb commented 8 years ago

Good points @DiegoPino. Authz/n needs to be included and worked out (so much better than D7...).

As well as the rdf -> entity conversion. That would include CRUD for the RDF mappings, I suppose.

Have we considered maybe using json-ld contexts to generate the rdf mappings? I know we've discussed the other way around. It would be nice to have formalize the 'Drupal context' that way.

UUIDs? The pragmatist in me wants them. The idealist in me thinks we should be mapping URIs and not using identifiers.

DiegoPino commented 8 years ago

@dannylamb i probably just need to do the inverse the process. This one, https://github.com/DiegoPino/claw-jsonld/blob/master/src/Normalizer/ContentEntityNormalizer.php takes entities, fields, references and fields and passes them to json-ld. The inverse https://github.com/DiegoPino/claw-jsonld/blob/master/src/Normalizer/ContentEntityNormalizer.php#L160, still not "done", needs to use easy_rdf to fetch a graph and resources and properties and pass that to creation logic. My only question is what am i getting from Fedora4? Does Alpaca/Salmon takes care of splitting that json-ld into multiple resources? And making sure metadata is there before passing binaries, or do i get a full graph?

dannylamb commented 8 years ago

From Fedora? JSON-LD representations of individual resources. We won't be passing around the resource map or multiples resources.

Binaries will challenge that, since there's both tech md and binary content. I'm not sure if we'll send both every time. It would probably be best to only send one or the other unless there's a good reason.

dannylamb commented 8 years ago

which leads to the question, in any use case, is there a time when you'd be attaching techmd to a file as rdf directly? instead of using what's generated automatically when you update the binary.

DiegoPino commented 8 years ago

@dannylamb from Fedora, if passing individual resources, what complicates me(other word for i don't know how!), and same goes the other way, is the order. Since LDP is tree based, and we do have hierarchies, how do i make sure things are created in the right order, if i can't make sure event messages are generated in that order?

techmd, depends really: media entity for image type(plugin i tested on friday) does extract EXIF and maps that info to bundle fields, which in turn can be mapped to rdf...

ruebot commented 8 years ago

@dannylamb Yes. We will/should setup the TECHMD profile on binaries, as well as store the output of FITS.

DiegoPino commented 8 years ago

@dannylamb i had this crazy idea a few weeks ago, that we could have a general "whole rdf" field in Drupal, that could allow us to sync all metadata (rdf) that comes from fedora into one field. That one then could be further processed and split into individual ones if needed, but that way we could assure that if a field (bundled) does not exists when sync happens, if it's created later, there is no need for re-sync to fill it. Does that makes sense? Same goes for complex stuff like TECHMD profiles..

dannylamb commented 8 years ago

@DiegoPino Once you go async, there is no guarantee on order since messages can get delayed, retried, etc... It's up to us that write functions that know when conditions are right to execute, and don't otherwise. And can be run multiple times.

@ruebot Thanks, that clears things up.

dannylamb commented 8 years ago

@DiegoPino Seems extreme, but we should keep that in mind as a strategy if re-indexing when rdf_mappings change becomes problematic.