HydraCG / Specifications

Specifications created by the Hydra W3C Community Group
Other
138 stars 26 forks source link

Simplify discovery of collections that contain resources of a certain type #126

Closed elf-pavlik closed 5 years ago

elf-pavlik commented 7 years ago

I don't want to wait for merging of #125 and would like to propose ASAP a possibility of addressing one of the issues we started discussing there. In #125 for client to discover collection of all events we rely on a custom property that relates the entry point with that collection examplevocab:events.

As alternative I would propose to use void:classPartition so the API EntryPoint would change to (given void prefix defined in JSON-LD context)

{
    "@context": "/api/context.jsonld",
    "@id": "/api",
    "@type": "hydra:EntryPoint",
    "void:classPartition": {
        "@id": "/api/events",
        "title": "List of events",
        "@type": "hydra:Collection",
        "void:class": "schema:Event",
        "operation": [
            {
                "@type": ["hydra:Operation", "schema:CreateAction"],
                "title": "Create new event",
                "method": "POST",
                "expects": "schema:Event"
            }
        ]
    }
}

And the client pseudo code for creating new event would look something like

var event = { ... };
var client = new HydraClient();
var operation = client.get("http://example.com")
    .getApiDocumentation()
    .getEntryPoint()
    .getClassPartitionFor('http://schema.org/Event')
    .getOperationOfType('http://schema.org/CreateAction');
client.invoke(operation, event);
asbjornu commented 7 years ago

Excuse my severe lack of RDF knowledge, but what is void:classPartition? Its description makes me none the wiser and your example leaves me both dizzy and puzzled. 😄

tpluscode commented 7 years ago

I'm not convinced either. Hardcore RDF people will immediately notice that this implies that /api and /api/events/ are void:Dataset. Is that your intention? I think they aren't because the notion of dataset is somewhat disconnected from an API (set of resources)

elf-pavlik commented 7 years ago

Excuse my severe lack of RDF knowledge, but what is void:classPartition? Its description makes me none the wiser and your example leaves me both dizzy and puzzled.

https://www.w3.org/TR/void/#class-property-partitions

I'm not convinced either. Hardcore RDF people will immediately notice that this implies that /api and /api/events/ are void:Dataset. Is that your intention? I think they aren't because the notion of dataset is somewhat disconnected from an API (set of resources)

API provides an interface to a dataset, and I see no problem with considering a collection a subset aft that dataset (so also a dataset)

Maybe @RubenVerborgh could chime in since he uses void in Linked Data Fragments specs.

tpluscode commented 7 years ago

Ah, Linked Data Fragments. void may be a good fit there. After all LDF is used exactly with coherent RDF datasets.

But not every API will be considered a dataset in that sense IMO.

Not to mention that Hydra would be used to describe non-RDF APIs too, right? I wouldn't want to describe RDF datasets to people who only want to describe rich HTTP interactions...

⁣-- Tomasz Pluskiewicz ​

On Jun 23, 2017, 01:12, at 01:12, elf Pavlik notifications@github.com wrote:

Excuse my severe lack of RDF knowledge, but what is void:classPartition? Its description makes me none the wiser and your example leaves me both dizzy and puzzled.

https://www.w3.org/TR/void/#class-property-partitions

I'm not convinced either. Hardcore RDF people will immediately notice that this implies that /api and /api/events/ are void:Dataset. Is that your intention? I think they aren't because the notion of dataset is somewhat disconnected from an API (set of resources)

API provides an interface to a dataset, and I see no problem with considering a collection a subset aft that dataset (so also a dataset)

Maybe @RubenVerborgh could chime in since he uses void in Linked Data Fragments specs.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/HydraCG/Specifications/issues/126#issuecomment-310527247

alien-mcl commented 7 years ago

It somehow touches wider topic of data model description and still open question of 'OWA or not to OWA'. I think that SHACL is somehow close to OWA's opposite CWA. There is also sh:datatype/sh:class - these do not give any further implications about the carrier of those properties. I'm not familiar with that vocab, but I'd definitely take a longer peek at that one.

lanthaler commented 7 years ago

@elf-Pavlik, as promised in today's telecom, I renamed this issue to not suggest a specific solution. Let's see if we can find a elegant pattern to natively support this in Hydra.

dunglas commented 7 years ago

In my opinion, it would be a big win if Hydra could describe the most common API use cases without requiring other vocabularies. Using VoID, OWL, and even RDFs increase the perceived complexity for newcomers and is bad for the adoption of the vocabulary. If implementing or consuming a Hydra API require a lot of knowledge regarding Linked Data / Semantic Web technologies, it will discourage a lot of people.

On the other hand both Swagger and GraphQL are self-contained formats, you just have to read (and understand) one spec to be ready to create or consume an API using them.

elf-pavlik commented 7 years ago

Besides suggested void:classPartition I previously posted on mailing list to evaluate solid:TypeRegistration

@RubenVerborgh in case you had chance to take a closer look at Solid maybe you have some insights on those two optins.

In general I think we should consider different audiences and different requirements.

  1. Developers who will always use some hydra client and rely on interface provided by it - for them it shouldn't matter what we decide to use
  2. Developers who will work with data as JSON(-LD), possibly always relying on compacting it with hydra JSON-LD context (which may hide possible fact of using terms from different namespaces)
  3. Developers will work with data as RDF triples/quads and don't care about any particular framing of JSON but care how things work in terms of RDF(S)/OWL

I propose to focus on the first case and take advantage of ongoing effort to implement a reference client Heracles. This way we can consider separately an interface provided by the client and the complexity of implementing various approaches in the client itself. Later we can look at different requirements of developers who don't want to rely on hydra clients and work on a lower level with data, both formated & framed JSON-LD and raw RDF triples/quads.

asbjornu commented 7 years ago

While I agree that a strong focus on a client is essential for Hydra's success, I'm afraid that with too little attention to which bytes goes over the wire and how easy those bytes are to understand for a developer will be detrimental to Hydra's success the same way SOAP's complexity and incomprehensibility due to its reliance on auto-generated clients lead to its downfall.

There were many more problems with SOAP, but I believe very strongly that without clarity and simplicity in the protocol, Hydra will be met with so much skepticism that it won't take off and without enough traction it will eventually fail.

RubenVerborgh commented 7 years ago

In my opinion, it would be a big win if Hydra could describe the most common API use cases without requiring other vocabularies.

This doesn't really matter with the right JSON-LD context: the properties manifest as simple strings, regardless of their origin (as hinted at by @elf-pavlik). It would be bad practice if every vocabulary reinvented existing concepts over and over.

Using VoID, OWL, and even RDFs increase the perceived complexity for newcomers

Newcomers will not even see that for the reason above.

@RubenVerborgh in case you had chance to take a closer look at Solid maybe you have some insights on those two optins.

I'm indeed working with the Solid team at MIT this summer, but haven't looked at collections yet. I'll try to figure this out!

dunglas commented 7 years ago

It would be bad practice if every vocabulary reinvented existing concepts over and over.

Schema.org imported types and properties from a lot of pre-existing vocabularies, and it's actually a success. It's an easy to understand, standalone vocabulary to describe basic data sets. Doing the same thing for Hydra (having a standalone vocab to describe or consume a basic API) would ease the adoption.

My point is not to reinvent everything - having the ability to mix Hydra with other vocabularies is one of the strength of using JSON-LD - but to have to refer to only one spec for to describe or consume basic APIs (I mean something like feature-parity with Swagger).

A well crafted JSON-LD context along with a good documentation may definitely help ; but it doesn't address the full concern and at the perceived complexity problem.

alien-mcl commented 7 years ago

I'd like to support @dunglas in his statement. There is no need to reinvent everything from scratch. But somehow I understand calls to have some built in features to cover most (if not all) common use cases. I feel that current spec leaves way to much room for interpretation making implementation cumbersome. In general, that's why we're working on the use cases and reference client implementation to pin-point those situations and address them correctly. Personally I think it won't be shame to reference other vocabularies in the hydra spec to cover what's missing.

lanthaler commented 6 years ago

As discussed in today's telecon:

As a guiding principle, we will start designing/defining new concepts in Hydra instead of trying to reuse bits and pieces of various vocabularies from the get-go. If we later discover that there's a considerable overlap with an existing vocabulary we may decide to use it instead of our own solution.

elf-pavlik commented 6 years ago

During last telecon I took action to extend current use cases in a way that provides clear reason for discovery of collections that contain resources of a certain type. Taking another look at EntryPoint from our Use Case it seems for me that having examplevocab:events and not using collection of all events itself as entry point already suggests need for such discovery. Currently also client in step 5. Creating a new event uses .getLink('http://example.com/vocab#events') on EntryPoint to discover that collection of events. If we just follow the guiding principle from our last telecon (captured in previous comment) and start by defining preferred terms in hydra: namespace. Would changes below make sense?

{
    "@context": "/api/context.jsonld",
    "@id": "/api",
    "@type": "hydra:EntryPoint",
    "partition": {
        "@id": "/api/events",
        "title": "List of events",
        "@type": "hydra:Collection",
        "memberType": "schema:Event",
        "operation": [
            {
                "@type": ["hydra:Operation", "schema:CreateAction"],
                "title": "Create new event",
                "method": "POST",
                "expects": "schema:Event"
            }
        ]
    }
}
var event = { ... };
var client = new HydraClient();
var operation = client.get("http://example.com")
    .getApiDocumentation()
    .getEntryPoint()
    .getPartitionByMemberType('http://schema.org/Event')
    .getOperationOfType('http://schema.org/CreateAction');
client.invoke(operation, event);

We should keep in mind that proposed on a wiki page Collection Design already partially addresses the requirement to discover collection of entities related by particular predicate (property) with some given entity. In our case here we have different case where we need to relate the EntryPoint to a Collection with has members of particular type. If we use for that some property like hydra:partition (or whatever we may want to call it) we should possible possibly define that it SHOULD NOT reference more then one collection with the same hydra:memberType (or whatever we may want to call it). I can't really think of a use case where one would want to reference more then one with members of the same type, and still expect client to choose between those collections in some meaningful way.

To further clarify our need for discovery of such 'member type based partitions', we could add venues to our api (schema:Place) which would give use following EntryPoint:

{
    "@context": "/api/context.jsonld",
    "@id": "/api",
    "@type": "hydra:EntryPoint",
    "partition": [{
        "@id": "/api/events",
        "title": "List of events",
        "@type": "hydra:Collection",
        "memberType": "schema:Event",
        "operation": [
            {
                "@type": ["hydra:Operation", "schema:CreateAction"],
                "title": "Create new event",
                "method": "POST",
                "expects": "schema:Event"
            }
        ]
    },{
        "@id": "/api/venues",
        "title": "List of venues",
        "@type": "hydra:Collection",
        "memberType": "schema:Place",
        "operation": [
            {
                "@type": ["hydra:Operation", "schema:CreateAction"],
                "title": "Create new venue",
                "method": "POST",
                "expects": "schema:Place"
            }
        ]
    }]
}

Please notice that in following steps of the use case currently client doesn't use any discovery and somehow directly starts from /api/events

lanthaler commented 6 years ago

Would changes below make sense? [first example with hydra:partition]

Yes, the example makes a lot of sense. Why did you decide to introduce hydra:partition instead of reusing hydra:collection? I don't see a reason why we couldn't reuse it here.

If we use for that some property like hydra:partition (or whatever we may want to call it) we should possible possibly define that it SHOULD NOT reference more then one collection with the same hydra:memberType (or whatever we may want to call it). I can't really think of a use case where one would want to reference more then one with members of the same type, and still expect client to choose between those collections in some meaningful way.

There may be use cases that require entities of the same type to be split based on another property. Think for instance stores split by country. So maybe instead going the easy route and use memberType which optimizes for this main use case, we should explore whether a construct describing constraints would make sense. I'm thinking of something like

{
    "@context": "/api/context.jsonld",
    "@id": "/api",
    "@type": "hydra:EntryPoint",
    "collection": {
        "@id": "/api/events",
        "title": "List of events in Italy",
        "@type": "hydra:Collection",
        "constraint": {
            "rdf:type": "Event",
            "country": "Italy"
        },
        "operation": [
            {
                "@type": ["hydra:Operation", "schema:CreateAction"],
                "title": "Create new event",
                "method": "POST",
                "expects": "schema:Event"
            }
        ]
    }
}
alien-mcl commented 6 years ago

I agree with @lanthaler about partition - feels foreign.

As for the example:

"constraint": { "rdf:type": "Event", "country": "Italy" },

I believe we're entering a dangerous mine field here. Correct me if I'm wrong, but this is what OWL restriction does - I don't think its a good approach as it opens a huge can of worms. Whole topic also touches idea of views and projections.

Generally - I'd leave logical partitioning to the developers. In @lanthaler 's case, all collections of that kind still would have members of same type. Having a stringly typed collections would also allow us to create shortened operation definitions (i.e. operation for creating new members could expect that type unless specified otherwise, assuming we have that mechanism in the spec.)

elf-pavlik commented 6 years ago

Why did you decide to introduce hydra:partition instead of reusing hydra:collection? I don't see a reason why we couldn't reuse it here.

:+1: let's just use hydra:collection

There may be use cases that require entities of the same type to be split based on another property. Think for instance stores split by country. So maybe instead going the easy route and use memberType which optimizes for this main use case, we should explore whether a construct describing constraints would make sense.

Doesn't manages block already handle such 'based on another property' case?

{
    "@id": "/api/events/france",
    "@type": "Collection",
    "manages": {
      "property": "schema:location",
      "object": "https://www.wikidata.org/wiki/Q142"
}

Actually I haven't thought before about just relying on rdf:type

{
    "@id": "api/events/france",
    "@type": "Collection",
    "manages": {
      "property": "rdf:type",
      "object": "schema:Event"
}

I don't remember if we discussed manages with multiple values, which your constraint suggestion seems to aim at, how would it handle using properties in @reverse direction?

elf-pavlik commented 6 years ago

There may be use cases that require entities of the same type to be split based on another property.

8. Advanced filtering of events also seems to try addressing this issue.

GET /api/events?schema:location=https%3A%2F%2Fwww.wikidata.org%2Fwiki%2FQ142

Triple Pattern Fragments Hypermedia controls also seem capable of handling that, especially if we can consider each collection as Dataset (subset of all the data exposed by API)

GET /api/events?p=http%3A%2F%2Fschema.org%2Flocation&o=https%3A%2F%2Fwww.wikidata.org%2Fwiki%2FQ142
alien-mcl commented 6 years ago

Triple Pattern Fragments Hypermedia controls also seem capable of handling that, especially if we can consider each collection as Dataset (subset of all the data exposed by API)

Looks like reification - I think it'd scare off non-RDF developers.

elf-pavlik commented 6 years ago

I think we could make something like hydra:memberType a special case (shortcut). Later when we have something more generic like

        "hydra:constraint": {
            "rdf:type": "schema:Event",
            "foo:country": "bar:Italy"
        }

we could add for reasoning purposes

hydra:memberType owl:propertyChainAxiom (hydra:constraint rdf:type) .

Once again, more generic solution like proposed hydra:constraint should IMO get considered together with hydra:manages. For example I don't see how hydra:constraint can support use of predicate in @reverse direction, while hydra:manages wouldn't allow owl:propertyChainAxiom 'shortcuts' like the one above (and not to rely on reasoning recommend to Materialize Inferences).

If we work in small increments, introducing hydra:memberType in use case seems like an improvement over current examplevocab:events. We can always update it to something more generic once we agree on it.


{
    "@context": "/api/context.jsonld",
    "@id": "/api",
    "@type": "hydra:EntryPoint",
    "collection": {
        "@id": "/api/events",
        "title": "List of events in Italy",
        "@type": "hydra:Collection",
        "memberType": "schema:Event",
        "operation": [
            {
                "@type": ["hydra:Operation", "schema:CreateAction"],
                "title": "Create new event",
                "method": "POST",
                "expects": "schema:Event"
            }
        ]
    }
}
elf-pavlik commented 6 years ago

in #132 i propose to simply reuse already agreed on Collection Design

elf-pavlik commented 6 years ago

I propose to move conversation from the review of #132 to here

@asbjornu: What I mean is that replacing the API-specific term loses the link type between one resource (the Entrypoint) and the other (List of events). I think that the link relation trumps the manages block, so to speak.

As for the client though, a way to discover collection of a given type may be useful. How would you like it to be based on "@type": "hydra:Collection" and the manages-block and not the predicate type? This way you can keep a more informative link predicate

@lanthaler: If such a relation exists, then yes, it should be used. This is for cases where no such relations exists yet and would only be created for the sake of being able to connect some entity with a collection.

@asbjornu: Hm, wouldn't it be a good rule of thumb to mint an API-specific property for any such case? I think that is precisely what you did in events example...

@lanthaler: Yeah, it's exactly what I did in the events example.. but only because there was nothing else available. The goal is to create clients that "understand" these APIs to some degree so that they can figure out how to use them. If everyone mints property like myvocab:events only clients that have been written specifically for that API will understand them. If we have a generic mechanisms as the one Pavlik proposes here on the other hand, we can implement client libraries that can find the collection that contains all events. [emphasis added by @elf-pavlik]

A custom link relation is only useful if it has some well-defined semantics that you can't convey otherwise.

I would add that I've only seen defined link relations / rdf predicated with a collection in range in very few cases. For example in ActivityPub draft with following and followers which attempts to address exactly the same use case which led us to manages block in Collection Design.

angelo-v commented 5 years ago

I use hydra:manages with rdf:type to discover collections that contain specific types and it works fine.

{
    "@type": "Collection",
    "manages": {
      "property": "rdf:type",
      "object": "schema:Event"
    }
}

Can we close this issue? If not, what is left to to?

tpluscode commented 5 years ago

Sadly, this area remains largely undocumented, among others. I'd keep this open until we get round to getting the details written down in the book and spec