Finalize feature set for MVP

dannylamb commented 8 years ago

Create a FINITE list of features that is as small as possible that will still give users the functionality they need and the foundation for the addition of new features.

manez commented 8 years ago

Count me in for this, and for writing up the MVP document in markdown when it's ready to move over to GitHub.

dannylamb commented 8 years ago

This is just for starters. Defnitely not authoritative. Please critique.

The ability to publish linked data
Synchronization with Fedora 4
Meaningful REST API
Support for Collections, Images, Books, and Pages
Can control metadata mappings from Drupal to RDF through a user interface
The ability to export/import JSON-LD
The ability to restrict access to collections and/or individual resources
The ability to index and search resources with Apache Solr

bryjbrown commented 8 years ago

Support for Collections, Images, Books, and Pages

Would it be worthwhile to consider having something analogous to the binary cmodel from 1.x in the MVP? Something that stores a non-RDF resource without making any assumptions about it? Maybe this isn't needed for a v1.0, just a thought.

dannylamb commented 8 years ago

Looking for refinement on that first bullet point.

Does 'publishing linked data' mean:

A public triplestore to query?
RDFa output?
some 'pure RDF' serialization format (like json-ld) available through content negotiation?

manez commented 8 years ago

"Can control metadata mappings from Drupal to RDF through a user interface" <- this is our equivalent/replacement for Form Builder?

dannylamb commented 8 years ago

@manez Yes, this would basically turn drupal into an RDF editor, where admins control the forms.

ruebot commented 8 years ago

The ability to publish linked data

This definitely needs to be fleshed out more. It's a little vague. Maybe some use cases?

Synchronization with Fedora 4

We should be more specific about what the synchronization is. Is it both ways, one way?

Meaningful REST API

What is "meaningful"?

Support for Collections, Images, Books, and Pages

:+1:

Can control metadata mappings from Drupal to RDF through a user interface

:+1:

The ability to export/import JSON-LD

:+1:

The ability to restrict access to collections and/or individual resources

:+1:

In the weeds; Is this WebAC or Drupal restriction?

The ability to index and search resources with Apache Solr

:+1:

DiegoPino commented 8 years ago

@ruebot and @dannylamb, yes we need to refine language, desired functionality and by this, also expectations. I feel some concepts are mixed up. But we have this sprint to solve this right?

acoburn commented 8 years ago

Related to @bryjbrown's comment: https://github.com/Islandora-CLAW/CLAW/issues/334#issuecomment-241519478 and publishing linked data, I think it would be a good idea to use a single URL for these resources. That is, don't produce HTML at one location and have the same resource as JSON-LD somewhere else. Rather, if you can have your endpoint produce HTML (with RDFa markup) for browsers and JSON-LD for clients that request application/ld+json or application/json. Personally, I wouldn't prioritize a public SPARQL endpoint.

DiegoPino commented 8 years ago

@acoburn i agree, drupal 8 REST API (which works as middleware in drupal's routing system) uses content negotiation + (sadly) a _format param to expose different serialisations on the same URI. The question will be which URI will be the canonical one, the UUID based one(which does not exist by default in drupal 8, i published a working implementation for our case that can be extended if needed) or the sequential numbered one, which is based on a '{entity_type}/{id}' routing pattern, with id a sequential number unique to each entity_type. All works pretty similar (in terms of workflow and params) to http://symfony.com/doc/current/routing.html#routing-format-param

DiegoPino commented 8 years ago

The ability to publish linked data:

Different presentations but all with congruent canonical URI for local linked resources which means translate fedora4 paths to published resources URI's RDFa(core drupal) in html, JSON-LD(Accept: application/ld+json) plus what any other contributed modules want to provide, with resources (like <> ldp:contains <some/resource>, etc) pointing to also publicly available resources in Drupal 8, <some/resource> becomes a drupal 8 canonical URL (following same convention as the referrer resource).

Follow your nose would be fine for html resources, but as i see this, a Drupal block solves this and can be even a contrib module.

Also good to remember: Drupal 8 allows for multiple view modes, so this can be user configured and adapted/expanded.

acoburn commented 8 years ago

One question to consider is: when publishing linked data for an aggregate resource (e.g. a Book), will Drupal publish the aggregate graph? Since the HTML display is (I assume) a sort of aggregation of resources (pages, files), I'd expect the JSON-LD repr would also contain the aggregate graph, but that would be good to spell out explicitly.

whikloj commented 8 years ago

The ability to publish linked data

I think @acoburn's content-neg description should cover our "publish linked data" needs (we can always expand if we get a persuasive use case).

The ability to restrict access to collections and/or individual resources

I think this could be Drupal restrictions, so long as they are translated to WebAC for Fedora...no?

Meaningful REST API

This is wide open to interpretation, but... would this be Drupal services to allow creation of resources in Drupal (which would push to Fedora) and/or would this be Silex services to allow creation of resources in Fedora (which would sync back to Drupal).

@DiegoPino I know you got your routing working, but I found this ticket for Drupal 8 core which appears very similar to what you have. Would it cause a conflict in future?

DiegoPino commented 8 years ago

@acoburn. I would expect (or code aiming for that) drupal publishing the aggregate graph. If a main simple drupal node aggregates multiple custom fedora resource entities, then it's serialization is an aggregation graph, which is what i (as today) would like to model.

acoburn commented 8 years ago

@DiegoPino cool, that's what I was hoping.

DiegoPino commented 8 years ago

@whikloj, no problem there with https://www.drupal.org/node/2353611. Since we can't aim right now for manual applied patches, i added my own Resolver just for fedora_resources, which is basically the same idea that they apply general in that ticket. Since i'm pretty sure they are not right now in a state where custom entities will inherit UUID routes, both things can live side by side. Also my routing still does not solve the linking, which involves messing with URL class.

my own Resolver means code borrow from there and here. I did not invent the wheel, but i made it spin here.

DiegoPino commented 8 years ago

The ability to restrict access to collections and/or individual resources

To be able to map and enforce WebAC in drupal 8 we need to investigate these services for fedora_resources type derived entities

See Drupal::accessManager https://api.drupal.org/api/drupal/core!lib!Drupal.php/function/Drupal%3A%3AaccessManager/8.2.x returns an Object implementing \Drupal\Core\Access\AccessManagerInterface) [https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Access%21AccessManagerInterface.php/interface/AccessManagerInterface/8.2.x]
If we need to use a different authentication system as Drupal itself, develop a service that implements \Drupal\Core\Authentication\AuthenticationProviderInterface and bind to our Service container using the authentication_provider service tag.

dannylamb commented 8 years ago

For 'publishing linked data'....

Question about publishing the entire graph:

Based on how D8 works with the format parameter, if the html representation is disambiguated with other representations by a query param like ?_format=json, then does that qualify as a distinctly different uri? I know that fragments aren't considered by http, so are query params?

dannylamb commented 8 years ago

Yeah... and sync.

So I've got big ideas, and want to go for the gold on it, but both Fedora and Drupal are going to need help enforcing conditional updates for it to work. Granted, no type of sync'ing whatsoever is going to work well without conditional updates, so it's not like I can do another approach. It's just that if time gets spent making both Fedora and Drupal respect conditional updates, then we may not have as much time to do what I want with sync.

That said, I'm shooting for full bidirectional sync. If we bake it into the lowest levels of D8 entities (like RDF and NonRDF Resources), then aside from having a few post-save events, there will be no mention of Fedora whatsoever in Drupal code. It's definitely the best way to decouple the two.

As far as implementations go, I'm looking at sticking Interval Tree Clocks both as a field in Drupal and in the RDF in Fedora. There has to be some middleware to intercept write requests to Fedora and make sure they update the Interval Tree Clock for the resource, and we can write the updates ourselves into the Drupal side of things. Then all replication can be handled async between two listeners, one for Drupal and one for Fedora.

dannylamb commented 8 years ago

FYI: Java and C implementation for Interval Tree Clocks (which are a generalization of both vector clocks and version vectors) here.

For PHP I was thinking about making an extension around the C code.

ruebot commented 8 years ago

no mention of Fedora whatsoever in Drupal code.

:+1:

For PHP I was thinking about making an extension around the C code.

That seems reasonable.

DiegoPino commented 8 years ago

@dannylamb some ideas that obviously need more discussion and can also be no-no, on how to publish the whole graph and also comply with an disambiguated URI for ORE is:

Use normal, node entity derived content types with fields that link to our fedora_resource entities(custom ones, provided by our module): this way we are emulating a ReM(Resource Map in ORE) and we can add new contents using all the UI goodies Drupal 8 provides etc. So, this Nodes have a different 'canonical' URL than the one assigned to it's aggregated(linked as field values) fedora_resources. I said canonical because in Drupal 8 you can basically make as many, pattern based, aliases as you want. Question here: discussing pros and cons, this ReM would really not exist in Fedora4, or at least right now we haven't defined a structure, place, whatever for an ReM.

OR

Make a permanent route act as a JSON-LD graph serialisation under a different Path

AND/OR (from IRC by @acoburn )

use link headers to point to the JSON-LD graph

OR

Create a Resource Map custom entity, with it's own controllers, serialisation specificities and of course a "rourte" (i'm starting to like this idea)

Assuming this JSON-LD serialisation of each fedora_resource would be just the resource itself. (more a question than an afirmation).

Anyone @Islandora-CLAW/sprinters wants to discuss this idea on IRC?

acoburn commented 8 years ago

It might be worth noting that someone might be able to write a little JAVA code for Fedora in order to generate vector-clock headers. There are currently hooks in the Fedora code for being able to do this. That way, the drupal code can just work on header values w/r/t the vector-clocks

DiegoPino commented 8 years ago

That said, I'm shooting for full bidirectional sync. If we bake it into the lowest levels of D8 entities (like RDF and NonRDF Resources), then aside from having a few post-save events, there will be no mention of Fedora whatsoever in Drupal code. It's definitely the best way to decouple the two.

That is the way i'm approaching stuff, still one direction (from Drupal to Fedora) but would like to discuss some approaches

acoburn commented 8 years ago

Meaningful REST API

The terminology is going to get a little weird here, but I'd highly recommend using Hydra for this. And by Hydra, I mean the vocabulary for describing hypermedia-driven web APIs.

DiegoPino commented 8 years ago

@acoburn++ Does using HydraCG implies using a complete different ontology or they can be mixed? I see the @context is peculiar hydraCG centric still don't get Hydra-cg completely so i paste this here, can be of use. http://stackoverflow.com/questions/25297719/get-a-collection-of-sub-resources-at-once-with-json-ld-and-hydra

ruebot commented 8 years ago

Hydra and Swagger.io? Or just one?

acoburn commented 8 years ago

@ruebot: I don't know enough about either to make a good decision. I would be happy to investigate.

ruebot commented 8 years ago

@acoburn I believe @dannylamb and @whikloj have done a fair bit of investigation on the swagger.io side of things: https://github.com/Islandora-CLAW/CLAW/issues/205

acoburn commented 8 years ago

@ruebot: I'm advocating for some mechanism to describe the API. If there's already momentum behind swagger.io, that's great

ruebot commented 8 years ago

@acoburn cool. I'll leave it to @dannylamb and @whikloj for thoughts/decisions there.

dannylamb commented 8 years ago

@acoburn i will write anything to get at vector clock headers. anything that will get me conditional updates. i don't need byte for byte comparison.

dannylamb commented 8 years ago

Publish Linked Data

Back to this. I'm thinking we should provide json-ld for every resource/entity in addition to the resource map, which yes, would make sense to have its own entity/node.

And I'm thinking we just generate the resource map RDF from the triple store. It can be dynamic at first, but we'll probably want to consider caching with invalidation based on a transitive SPARQL query. And if we can't make the assertions on other resources in Fedora, then I guess we have no choice but to preserve them as NonRDFResources (the irony is killing me).

Meaningful API

Looks like Drupal is going to thwart us if we want normal looking conneg. No PUTs kinda stinks too. I'm tempted to try and smooth this stuff over with middlewares. Looks like you can even make a silex application act as a filter.

dannylamb commented 8 years ago

About swagger: Server side stubs don't seem to be worth generating. And the little tester page has a hard time with conneg because it overrides accept headers you set even if they're a parameter you're providing as per the schema. You have to manually list all types of consumed and produced messages in the schema, so something like "any Content-Type you can provide" is awfully hard to describe. I'm saying this because i spent some time trying to swaggerize the Fedora API and ran into that gem.

The client code generation of swagger is still nifty, though.

But anything that describes the API in a machine readable format is a good thing. If people think using RDF to describe the API is better, then we can go for that. No love lost with Swagger.

acoburn commented 8 years ago

I had some time this morning to think more about ResourceMaps / Aggregations and the goal of "Publishing Linked Data", all in the context of some recent threads of discussion. Here are some thoughts (please critique):

The Drupal representation of the aggregated resource is the ResourceMap.

That is, don't store the ResourceMaps in Fedora but do store the aggregations in Fedora with descriptive metadata attached to these Aggregations. That resource map would have an HTML serialization and a JSON-LD serialization (i.e. each at different URLs, which, as I understand, is how Drupal does it). E.g. you might have http://example.org/obj/foo for the HTML serialization and http://example.org/obj/foo?_format=json for the JSON-LD version. Both serializations would include the complete aggregated graph. Each would also use a link header Link: <http://example.org/linkeddata/foo>; rel="describes" to point to the particular Aggregation, which can be dereferenced by any linked data client. The metadata attached directly to the ResourceMap would be very minimal: Islandora-CLAW would be the dcterms:creator, plus any additional necessary metadata -- as mentioned above, the primary descriptive metadata would be attached at the Aggregation level.

The Aggregations would be available separately (as per the ORE spec): e.g. http://example.org/linkeddata/foo, available in HTML and JSON-LD formats (or others, if necessary)

This endpoint could live entirely separately from Drupal and/or be based on a simple template service. Personally, I wouldn't include ldp:contains triples for these resources (i.e. I'd rely mostly on ldp-member triples), but I wouldn't draw a line in the sand on that point. In contrast to the ResourceMap serializations, the resources serialized at the /linkeddata/... endpoint would not include child and/or aggregated resources -- they would basically obey the "single-subject" restriction we see in Fedora (so they could include hash URIs).

To me, this seems like it has the advantage of following the ORE spec (as I understand it) and fitting into the models that both Fedora and Drupal provide, while also retaining the semantics of ORE and linked data.

dannylamb commented 8 years ago

Closing since sprint is over. We can open another ticket to 'revisit' this concept later if required.

Islandora / documentation