Closed dannylamb closed 8 years ago
Count me in for this, and for writing up the MVP document in markdown when it's ready to move over to GitHub.
This is just for starters. Defnitely not authoritative. Please critique.
Support for Collections, Images, Books, and Pages
Would it be worthwhile to consider having something analogous to the binary cmodel from 1.x in the MVP? Something that stores a non-RDF resource without making any assumptions about it? Maybe this isn't needed for a v1.0, just a thought.
Looking for refinement on that first bullet point.
Does 'publishing linked data' mean:
"Can control metadata mappings from Drupal to RDF through a user interface" <- this is our equivalent/replacement for Form Builder?
@manez Yes, this would basically turn drupal into an RDF editor, where admins control the forms.
The ability to publish linked data
This definitely needs to be fleshed out more. It's a little vague. Maybe some use cases?
Synchronization with Fedora 4
We should be more specific about what the synchronization is. Is it both ways, one way?
Meaningful REST API
What is "meaningful"?
Support for Collections, Images, Books, and Pages
:+1:
Can control metadata mappings from Drupal to RDF through a user interface
:+1:
The ability to export/import JSON-LD
:+1:
The ability to restrict access to collections and/or individual resources
:+1:
In the weeds; Is this WebAC or Drupal restriction?
The ability to index and search resources with Apache Solr
:+1:
@ruebot and @dannylamb, yes we need to refine language, desired functionality and by this, also expectations. I feel some concepts are mixed up. But we have this sprint to solve this right?
Related to @bryjbrown's comment: https://github.com/Islandora-CLAW/CLAW/issues/334#issuecomment-241519478 and publishing linked data, I think it would be a good idea to use a single URL for these resources. That is, don't produce HTML at one location and have the same resource as JSON-LD somewhere else. Rather, if you can have your endpoint produce HTML (with RDFa markup) for browsers and JSON-LD for clients that request application/ld+json
or application/json
. Personally, I wouldn't prioritize a public SPARQL endpoint.
@acoburn i agree, drupal 8 REST API (which works as middleware in drupal's routing system) uses content negotiation + (sadly) a _format param to expose different serialisations on the same URI. The question will be which URI will be the canonical one, the UUID based one(which does not exist by default in drupal 8, i published a working implementation for our case that can be extended if needed) or the sequential numbered one, which is based on a '{entity_type}/{id}' routing pattern, with id
a sequential number unique to each entity_type.
All works pretty similar (in terms of workflow and params) to http://symfony.com/doc/current/routing.html#routing-format-param
Different presentations but all with congruent canonical URI for local linked resources which means translate fedora4 paths to published resources URI's
RDFa
(core drupal) in html, JSON-LD
(Accept: application/ld+json
) plus what any other contributed modules want to provide, with resources (like <> ldp:contains <some/resource>
, etc) pointing to also publicly available resources in Drupal 8, <some/resource> becomes a drupal 8 canonical URL (following same convention as the referrer resource).
Follow your nose would be fine for html resources, but as i see this, a Drupal block solves this and can be even a contrib module.
Also good to remember: Drupal 8 allows for multiple view modes, so this can be user configured and adapted/expanded.
One question to consider is: when publishing linked data for an aggregate resource (e.g. a Book), will Drupal publish the aggregate graph? Since the HTML display is (I assume) a sort of aggregation of resources (pages, files), I'd expect the JSON-LD repr would also contain the aggregate graph, but that would be good to spell out explicitly.
The ability to publish linked data
I think @acoburn's content-neg description should cover our "publish linked data" needs (we can always expand if we get a persuasive use case).
The ability to restrict access to collections and/or individual resources
I think this could be Drupal restrictions, so long as they are translated to WebAC for Fedora...no?
Meaningful REST API
This is wide open to interpretation, but... would this be Drupal services to allow creation of resources in Drupal (which would push to Fedora) and/or would this be Silex services to allow creation of resources in Fedora (which would sync back to Drupal).
@DiegoPino I know you got your routing working, but I found this ticket for Drupal 8 core which appears very similar to what you have. Would it cause a conflict in future?
@acoburn. I would expect (or code aiming for that) drupal publishing the aggregate graph. If a main simple drupal node aggregates multiple custom fedora resource entities, then it's serialization is an aggregation graph, which is what i (as today) would like to model.
@DiegoPino cool, that's what I was hoping.
@whikloj, no problem there with https://www.drupal.org/node/2353611. Since we can't aim right now for manual applied patches, i added my own Resolver just for fedora_resources, which is basically the same idea that they apply general in that ticket. Since i'm pretty sure they are not right now in a state where custom entities will inherit UUID routes, both things can live side by side. Also my routing still does not solve the linking, which involves messing with URL class.
my own Resolver
means code borrow from there and here. I did not invent the wheel, but i made it spin here.
To be able to map and enforce WebAC in drupal 8 we need to investigate these services for fedora_resources type derived entities
authentication_provider
service tag.For 'publishing linked data'....
Question about publishing the entire graph:
Based on how D8 works with the format parameter, if the html representation is disambiguated with other representations by a query param like ?_format=json, then does that qualify as a distinctly different uri? I know that fragments aren't considered by http, so are query params?
Yeah... and sync.
So I've got big ideas, and want to go for the gold on it, but both Fedora and Drupal are going to need help enforcing conditional updates for it to work. Granted, no type of sync'ing whatsoever is going to work well without conditional updates, so it's not like I can do another approach. It's just that if time gets spent making both Fedora and Drupal respect conditional updates, then we may not have as much time to do what I want with sync.
That said, I'm shooting for full bidirectional sync. If we bake it into the lowest levels of D8 entities (like RDF and NonRDF Resources), then aside from having a few post-save events, there will be no mention of Fedora whatsoever in Drupal code. It's definitely the best way to decouple the two.
As far as implementations go, I'm looking at sticking Interval Tree Clocks both as a field in Drupal and in the RDF in Fedora. There has to be some middleware to intercept write requests to Fedora and make sure they update the Interval Tree Clock for the resource, and we can write the updates ourselves into the Drupal side of things. Then all replication can be handled async between two listeners, one for Drupal and one for Fedora.
FYI: Java and C implementation for Interval Tree Clocks (which are a generalization of both vector clocks and version vectors) here.
For PHP I was thinking about making an extension around the C code.
no mention of Fedora whatsoever in Drupal code.
:+1:
For PHP I was thinking about making an extension around the C code.
That seems reasonable.
@dannylamb some ideas that obviously need more discussion and can also be no-no, on how to publish the whole graph and also comply with an disambiguated URI for ORE is:
node
entity derived content types with fields that link to our fedora_resource entities(custom ones, provided by our module): this way we are emulating a ReM(Resource Map in ORE) and we can add new contents using all the UI goodies Drupal 8 provides etc. So, this Nodes have a different 'canonical' URL than the one assigned to it's aggregated(linked as field values) fedora_resources. I said canonical because in Drupal 8 you can basically make as many, pattern based, aliases as you want.
Question here: discussing pros and cons, this ReM would really not exist in Fedora4, or at least right now we haven't defined a structure, place, whatever for an ReM.OR
AND/OR (from IRC by @acoburn )
OR
Assuming this JSON-LD serialisation of each fedora_resource would be just the resource itself. (more a question than an afirmation).
Anyone @Islandora-CLAW/sprinters wants to discuss this idea on IRC?
It might be worth noting that someone might be able to write a little JAVA code for Fedora in order to generate vector-clock headers. There are currently hooks in the Fedora code for being able to do this. That way, the drupal code can just work on header values w/r/t the vector-clocks
That said, I'm shooting for full bidirectional sync. If we bake it into the lowest levels of D8 entities (like RDF and NonRDF Resources), then aside from having a few post-save events, there will be no mention of Fedora whatsoever in Drupal code. It's definitely the best way to decouple the two.
That is the way i'm approaching stuff, still one direction (from Drupal to Fedora) but would like to discuss some approaches
The terminology is going to get a little weird here, but I'd highly recommend using Hydra for this. And by Hydra, I mean the vocabulary for describing hypermedia-driven web APIs.
@acoburn++
Does using HydraCG implies using a complete different ontology or they can be mixed? I see the @context
is peculiar hydraCG centric still don't get Hydra-cg completely so i paste this here, can be of use.
http://stackoverflow.com/questions/25297719/get-a-collection-of-sub-resources-at-once-with-json-ld-and-hydra
Hydra and Swagger.io? Or just one?
@ruebot: I don't know enough about either to make a good decision. I would be happy to investigate.
@acoburn I believe @dannylamb and @whikloj have done a fair bit of investigation on the swagger.io side of things: https://github.com/Islandora-CLAW/CLAW/issues/205
@ruebot: I'm advocating for some mechanism to describe the API. If there's already momentum behind swagger.io, that's great
@acoburn cool. I'll leave it to @dannylamb and @whikloj for thoughts/decisions there.
@acoburn i will write anything to get at vector clock headers. anything that will get me conditional updates. i don't need byte for byte comparison.
Back to this. I'm thinking we should provide json-ld for every resource/entity in addition to the resource map, which yes, would make sense to have its own entity/node.
And I'm thinking we just generate the resource map RDF from the triple store. It can be dynamic at first, but we'll probably want to consider caching with invalidation based on a transitive SPARQL query. And if we can't make the assertions on other resources in Fedora, then I guess we have no choice but to preserve them as NonRDFResources (the irony is killing me).
Looks like Drupal is going to thwart us if we want normal looking conneg. No PUTs kinda stinks too. I'm tempted to try and smooth this stuff over with middlewares. Looks like you can even make a silex application act as a filter.
About swagger: Server side stubs don't seem to be worth generating. And the little tester page has a hard time with conneg because it overrides accept headers you set even if they're a parameter you're providing as per the schema. You have to manually list all types of consumed and produced messages in the schema, so something like "any Content-Type you can provide" is awfully hard to describe. I'm saying this because i spent some time trying to swaggerize the Fedora API and ran into that gem.
The client code generation of swagger is still nifty, though.
But anything that describes the API in a machine readable format is a good thing. If people think using RDF to describe the API is better, then we can go for that. No love lost with Swagger.
I had some time this morning to think more about ResourceMaps / Aggregations and the goal of "Publishing Linked Data", all in the context of some recent threads of discussion. Here are some thoughts (please critique):
That is, don't store the ResourceMaps in Fedora but do store the aggregations in Fedora with descriptive metadata attached to these Aggregations
. That resource map would have an HTML serialization and a JSON-LD serialization (i.e. each at different URLs, which, as I understand, is how Drupal does it). E.g. you might have http://example.org/obj/foo for the HTML serialization and http://example.org/obj/foo?_format=json for the JSON-LD version. Both serializations would include the complete aggregated graph. Each would also use a link header Link: <http://example.org/linkeddata/foo>; rel="describes"
to point to the particular Aggregation
, which can be dereferenced by any linked data client. The metadata attached directly to the ResourceMap would be very minimal: Islandora-CLAW would be the dcterms:creator, plus any additional necessary metadata -- as mentioned above, the primary descriptive metadata would be attached at the Aggregation
level.
Aggregation
s would be available separately (as per the ORE spec): e.g. http://example.org/linkeddata/foo, available in HTML and JSON-LD formats (or others, if necessary)This endpoint could live entirely separately from Drupal and/or be based on a simple template service. Personally, I wouldn't include ldp:contains
triples for these resources (i.e. I'd rely mostly on ldp-member triples), but I wouldn't draw a line in the sand on that point. In contrast to the ResourceMap serializations, the resources serialized at the /linkeddata/...
endpoint would not include child and/or aggregated resources -- they would basically obey the "single-subject" restriction we see in Fedora (so they could include hash URIs).
To me, this seems like it has the advantage of following the ORE spec (as I understand it) and fitting into the models that both Fedora and Drupal provide, while also retaining the semantics of ORE and linked data.
Closing since sprint is over. We can open another ticket to 'revisit' this concept later if required.
Create a FINITE list of features that is as small as possible that will still give users the functionality they need and the foundation for the addition of new features.