HydraCG / Specifications

Specifications created by the Hydra W3C Community Group
Other
138 stars 26 forks source link

Another take on non-RDF payloads (aka file upload) #199

Open tpluscode opened 4 years ago

tpluscode commented 4 years ago

Describe the requirement

We've approached the problem of non-RDF media types a few times already. Unfortunately it seems that each time it was not focused enough. Either mixed with collections (#187) or lacking broader context (#186).

Looking back at both, I think they are on track, but need a little more refinement.

For this issues, I'd like to focus on expects only and not returned representations

Hydra-agnostic example

I would distinguish 3 kinds of requests coming to a Hydra API:

  1. RDF payloads - such that are currently described by expects and Class.

  2. Non RDF-payloads

    Directly uploading an image instead of RDF:

    POST /movie/123/poster-image HTTP/2
    Content-Type: image/png
    
    ...File bytes...
  3. multipart/form-data

    Submitting multiple images and RDF data:

    PUT /movie/123/image-gallery HTTP/2
    Content-Type: multipart/form-data; boundary=----hydra-content
    
    ----hydra-content
    Content-Disposition: form-data; filename="poster.png"
    Content-Type: image/png
    
    ...Poster image...
    ----hydra-content
    Content-Disposition: form-data; filename="cast.jpeg"
    Content-Type: image/jpeg
    
    ...Image of actors...
    ----hydra-content
    Content-Type: application/ld+json
    
    {
     "@type": "mov:Gallery",
     "description": {
       "@value": "Pictures for movie /movie/123"
     }
    }
    ----hydra-content

Hydra should allow describing operations which expect both kinds of file uploads.

Proposed solutions

It is important to keep support for the current expect semantics.

I propose that we extend the existing structure with a media-type description. Unfortunately it is not possible to have it both ways without revolutionising the structure, so the vocab will have to remove rdfs:range from expects and use schema:rangeIncludes instead.

{
  "@id": "hydra:expects",
-  "range": "hydra:Class",
+  "schema:rangeIncludes": [
+    "hydra:Class",
+    "hydra:RequestSpecification"
+  ]
},

Example of hydra:Class usage

{
  "@type": "Operation",
  "expects": {
    "@type": "RequestSpecification",
    "content": {
      "@type": "SupportedClassContent",
      "class": "mov:Movie"
    }
  }
}

This would be equivalent to "expects": "mov:Movie" and both should be supported at least for a while.

Example of non-RDF payload

{
  "@type": "Operation",
  "expects": {
    "@type": "RequestSpecification",
    "content": {
      "@type": "RawContent",
      "supportedContentType": [ "image/png", "image/jpeg" ]
    }
  }
}

Example of multipart

{
  "@type": "Operation",
  "expects": {
    "@type": "RequestSpecification",
    "content": {
      "@type": "MultipartContent",
      "allowedParts": [
        {
          "supportedContentType": [ "image/png", "image/jpeg" ],
          "maxCount": 2
        },
        {
          "@type": "SupportedClassContent",
          "class": "mov:Movie",
          "minCount": 1,
          "maxCount": 1
        }
      ]
    }
  }
}

Above interpreted as:

MultipartContent would have to become part of the core vocabulary.

Implications

The consequences of such design are far reaching:

  1. By introducing RequestSpecification we can directly describe HTTP requests (such as by using expectHeader)

  2. The content predicat can be an extension point we've talked about, allowing 3rd party vocab to describe bodies using SHACL. Something like

    ā€‹ ShaclContentSpecification subclassOf ContentSpecification

  3. It will even be possible to define operations which expect markdown, plain text or any other textual format

Alternative solutions

Here's how Open API does that for file uploads and multipart requests. For example

requestBody:
  content: 
    multipart/form-data: # Media type
      schema:            # Request payload
        type: object
        properties:      # Request parts
          id:            # Part 1 (string value)
            type: string
            format: uuid
          address:       # Part2 (object)
            type: object
            properties:
              street:
                type: string
              city:
                type: string
          profileImage:  # Part 3 (an image)
            type: string
            format: binary

Note that id , address and profileImage will be separate request parts.

asbjornu commented 4 years ago

I think this looks like a great proposal. Flexible without being too complex. šŸ‘

alien-mcl commented 4 years ago

Just as a quick note - I think the approach taken in #186 is less revolutionary, but I can see both approaches has some similarities (RequestSpecification vs MediaTypedResource used to provide custom description of the payload).

As for multiple files upload - bare in mind that there are several possibilities:

I'll provide more feedback later

tpluscode commented 4 years ago

I think the approach taken in #186 is less revolutionary

Indeed, but I think we need revolutionary

but I can see both approaches has some similarities

Definitely inspired by the former proposals, but I intend a flexible solutions

server may provide multiple expected classes

would the min/max cardinalities cover that? Having multiple 0-1 parts, each for a different class...

it is both possible and doable to provide all resources in RDF

Short answer: šŸ¤® Long answer: you'd need to invent/resuse even more terms to describe objects of those properties. With multipart/form-data we're using same approach everyone else on the web uses.

And cannot agree with the serious processing.

cardinality should be provided in some more unified way

We could use those terms for property cardinalities.

On the other hand the multipart support could be its own auxiliary spec, with its own specific terms. Much like SHACL will definitely be an independent extension and shapes have their own cardinality lingo.

angelo-v commented 4 years ago

Haven't thought this trough yet, but looks good at first impression :+1:

alien-mcl commented 4 years ago

Indeed, but I think we need revolutionary

Not really - there are a couple of other specs built on top of hydra that has some more implementations. We shall keep as much of the backward compatibility as possible. I'd still like to downgrade the mentioned rdfs:range from hydra:Class to hydra:Resource so either RequestSpecification or MediaTypedResource from #186 (or whatever name would it be) fits by being a hydra:Resource

Definitely inspired by the former proposals, but I intend a flexible solutions

Well - those approaches also claimed to be flexible.

would the min/max cardinalities cover that? Having multiple 0-1 parts, each for a different class...

I meant we need to think it over carefully. There are several places that would benefit from cardinality specifications. There are also other vocabs that already provide these semantics.

Long answer: you'd need to invent/resuse even more terms to describe objects of those properties. With multipart/form-data we're using same approach everyone else on the web uses.

Quite the opposite - hydra:property already exists. I'm not claiming that pushing base-64 files through RDF payloads is a nice and clean approach. I'm just saying that handcrafting a sculpture with multipart content that is also not that common (older web API frameworks may not provide support out of the box) is neither a pretty one.

And cannot agree with the serious processing.

I remember I tried to send a multipart requests in a browser and it end up with not so nice code. Maybe something has changed since that time, but it is not something a browser can provide out of the box. Maybe there are already some JS libraries to make it easier, but it still requires some heavy stuff written that uses file API, buffers and other quite fresh JS elements available in modern browsers.

In general - it feels like 'RequestSpecification'/'supportedContentType' related part is somehow similar to terms presented in #186 and both should meet same criticism and alternate ideas, i.e. @angelo-v 's approach with more generic constraint-like specifications (experiment provided with #187).

As for the multipart content - it looks like it was created solely to meet some particular requirement and feels it was not well considered. It seems to be heavily coupled with HTTP and it does not tackle various scenarios (i.e. pre-uploading like in web mail clients where attachments can be uploaded before sending an email).

tpluscode commented 4 years ago

I'd still like to downgrade the mentioned rdfs:range from hydra:Class tohydra:Resourceso eitherRequestSpecificationorMediaTypedResourcefrom #186 (or whatever name would it be) fits by being ahydra:Resource`

I concur. The only issue I have with just the "downgrade" is that we'd completely lose any semantics. Replacing that with rangeIncludes give back some of that hint fo what kinds of descriptions are expected.

We shall keep as much of the backward compatibility as possible.

Yes, I definitely wish to keep [] hydra:expects some:Class a valid construct.

Well - those approaches also claimed to be flexible.

Like I said, #187 is confusing in how it brings collections into the mix. And #186 is just a tad too narrow in scope. I opened this to offer a more open solution which can potentially include SHACL and possibly unexpected extensions.

Let's find middle ground.

I meant we need to think it over carefully. There are several places that would benefit from cardinality specifications. There are also other vocabs that already provide these semantics.

Maybe let's ignore multipart for now. If we can get the basic structure extensible "enough", then such an extension can be developed on the side without invading the core.

In general - it feels like 'RequestSpecification'/'supportedContentType' related part is somehow similar to terms presented in #186 and both should meet same criticism and alternate ideas, i.e. @angelo-v 's approach with more generic constraint-like specifications (experiment provided with #187).

It is similar. I hoped to gather the best of both ideas.

alien-mcl commented 4 years ago

I concur. The only issue I have with just the "downgrade" is that we'd completely lose any semantics. Replacing that with rangeIncludes give back some of that hint fo what kinds of descriptions are expected.

But it breaks existing clients and specs.

And #186 is just a tad too narrow in scope.

Well, baby steps. I feel this issue is to wide

Maybe let's ignore multipart for now. If we can get the basic structure extensible "enough", then such an extension can be developed on the side without invading the core.

Yep - sounds reasonable.

It is similar. I hoped to gather the best of both ideas.

I'm opened. I'll invite community on the mailing list tomorrow (I've reached my limit today) to the discussion.