cs3org / OCM-API

OpenCloudMesh API
38 stars 11 forks source link

New multi spec in accordance with v1.0 #73

Closed smesterheide closed 1 year ago

smesterheide commented 1 year ago

I think we have been trough so many iterations, that we might have missed this simple solution. Please tell me what is wrong with the following: I would argue we should keep the original v1.0 structure with name and options as required and use it as intended.

That means single-protocol shares keep their respective name and we improve the options part. For multi-protocol shares multi is used as name and options for webdav, webapp and datatx are placed in the options object.

smesterheide commented 1 year ago

Please disregard the descriptions for the moment

glpatcern commented 1 year ago

@smesterheide could you please provide here an example of payload for a multi-protocol share? With the misleading descriptions and no examples I fail to understand how it would look like.

smesterheide commented 1 year ago

The examples given should reflect the current the proposal:

single protocol legacy

protocol:
  name: "webdav"
  options:
    sharedSecret: "hfiuhworzwnur98d3wjiwhr"
    permissions: "some permissions scheme"

single protocol new

should be identical with the old legacy format just with options formalized

protocol:
  name: "webdav"
  options:
    permissions: ["read"]
    uri: "https://open-cloud-mesh.org/remote.php/webdav/share-hash/path/to/spec.yaml"

multiple protocols

protocol:
  name: "multi"
  options:
    webdav:
      sharedSecret: "hfiuhworzwnur98d3wjiwhr"
      permissions: ["read"]
      uri: "https://open-cloud-mesh.org/remote.php/webdav/path/to/spec.yaml"
    webapp:
      uriTemplate: "https://open-cloud-mesh.org/s/share-hash/{relative-path-to-shared-resource}"
      viewMode: "read"
    datatx:
      sharedSecret: "hfiuhworzwnur98d3wjiwhr"
      srcUri: "https://open-cloud-mesh.org/remote.php/webdav/path/to/spec.yaml"
      size: 100000
glpatcern commented 1 year ago

The examples given should reflect the current the proposal:

I'm not sure this is really an improvement: the parsing of such payloads seems significantly more complex.

1) To tell apart single protocol legacy vs new, one has to look at the presence or absence of protocol.options.uri, the rest being the same. However, options in v1.0 is totally unspecified and its usage is just out of examples and recommendations, not out of specifications. Theoretically, an implementation could exist with protocol.options.uri used with a different meaning, yet being v1.0 compliant, but all of a sudden it becomes non-compliant with v1.1: arguably that is a protocol's breaking change. Therefore, I find it weak (and weaker than the current definition in our prospect v1.1) that a discrimination be built on top of a non-specified property by partially specifying it. The damage had happened...

2) To tell apart single vs multi protocol in the new format, one has to look at protocol.options.webdav.<parameters> vs protocol.options.<parameters>. This is even more of a pain in terms of code paths and parsing!

Also, a final observation: as I mentioned in #71 (and in other occasions), a good reason to deprecate the options field is to "fix the damage" caused by the incompleteness of a required unspecified field. By having other fully specified fields, in your own words "we can then keep the new stuff and get rid of the clutter once we go to v2.0", or something like that. If we keep the options it's harder to "get rid of the clutter".

smesterheide commented 1 year ago

I actually think it is less complex.

glpatcern commented 1 year ago

@michielbdejong and @yasharpm, what do you think? More complex or less complex? And more breaking change or less breaking change?

glpatcern commented 1 year ago

I actually think it is less complex.

From a pure implementation point of view, having a variable schema is significantly more complex. E.g. Golang allows to parse a payload against a given schema, and then decisions are based on values. In a case of multiple different schemas, the code would become much more complex.

This alone is a reason to reject this proposal.

smesterheide commented 1 year ago

2. To tell apart single vs multi protocol in the new format, one has to look at protocol.options.webdav.<parameters> vs protocol.options.<parameters>. This is even more of a pain in terms of code paths and parsing!

Can you please explain why protocol.name is insufficient to distinguish single from multi protocol shares? As I stated in the beginning I might just as well miss a point, especially regarding the discussion in #71.

glpatcern commented 1 year ago

Can you please explain why protocol.name is insufficient to distinguish single from multi protocol shares? As I stated in the beginning I might just as well miss a point, especially regarding the discussion in #71.

Right, i overlooked that on single protocol we'd have name = webdav. Which suggests that a datatx-only share would look like the following, for similarity?

protocol:
  name: "datatx"
  options:
    srcUri: "https://open-cloud-mesh.org/remote.php/webdav/share-hash/path/to/spec.yaml"
    size: 10000

Yet, it remains the fact that such a variant schema is a pain in terms of coding the parser, because you don't know in advance which properties you're going to have - think the difference between querying via SQL a table vs meta-querying the table structure.

And further specifying sub-fields inside options (thus going from void* to a varied struct) would be a conceptual breaking change with respect to v1.0, at least as far as I understood @yasharpm's arguments.

smesterheide commented 1 year ago

Okay, I get where you are coming from now. I agree that the specification should be easy to implement and suitable for code generation. At the same time I maintain my position that this proposal is more in line with the original spec design and easier to grasp.

glpatcern commented 1 year ago

At the same time I maintain my position that this proposal is more in line with the original spec design and easier to grasp.

I acknowledge that this is more in line with the original spec design. In terms of ease to understand it, I'm rather neutral - fields named after the protocol name are as easy to read, and on top they match the structure in the /ocm-provider endpoint. All in all, I think the current proposal is a better compromise.

smesterheide commented 1 year ago

Also, a final observation: as I mentioned in #71 (and in other occasions), a good reason to deprecate the options field is to "fix the damage" caused by the incompleteness of a required unspecified field.

I don't see any problems with adding properties to an existing field. Can you please explain your reasoning here?

glpatcern commented 1 year ago

I don't see any problems with adding properties to an existing field. Can you please explain your reasoning here?

OK, let's make an example. A share such as:

protocol:
  name: "webdav"
  options:
    srcURI: https://server/ocm/token/path/to/resource
    token: gljsrthgksrtjgr
    sourceVersion: NC25
    targetMinimalVersions: ["NC20", "OC10"]

Is actually v1.0 compliant. Do we agree on that?

Now, if v1.1 now decides to standardize whichever of the properties within options, it introduces a breaking change to any implementation that does not comply with what v1.1 would decide post-facto (here, use of token vs sharedSecret, use of srcURI, etc.).

smesterheide commented 1 year ago

I am about to close this PR since we recently published v1.1 and this PR has become obsolete.

However I think the discussion is still worthwhile to continue, not least with regard to future versions.

Is actually v1.0 compliant. Do we agree on that?

We are talking about additional properties here for which validation is governed by additionalProperties. So I took a moment to look at the JSON Schema spec which underpins the OpenAPI 2.0/Swagger definition, see schema objects and additional properties validation. It states that any additional properties are allowed as long as additionalProperties is not explicitly given a false value. Hence the options object definition is correct which we already both assumed.

Now, if v1.1 now decides to standardize whichever of the properties within options, it introduces a breaking change to any implementation that does not comply with what v1.1 would decide post-facto (here, use of token vs sharedSecret, use of srcURI, etc.).

I don't understand how you can break something that didn't exist in the first place. Aren't we adding new properties elsewhere all the time?

glpatcern commented 1 year ago

However I think the discussion is still worthwhile to continue, not least with regard to future versions.

Fully agreed.

Is actually v1.0 compliant. Do we agree on that?

Hence the options object definition is correct which we already both assumed.

OK, so a v1.0 implementation can legitimately use an options payload like in the above example, in particular using token, uri, srcURI or anything else as opposed to the non-standard, common practice of sharedSecret and permissions.

I don't understand how you can break something that didn't exist in the first place. Aren't we adding new properties elsewhere all the time?

The way I understand it: adding optional properties to any payload is naturally backwards compatible. Adding (mandatory) sub-fields to a property, which was initially defined as a generic object, thus without any specified structure, is a breaker: the hypothetical implementation that decided to use those sub-fields with potentially different semantics, becomes "v1.next non-compliant" whereas it was "v1.0 compliant".

Surely this is an academic discussion, we know such implementations very likely do not exist. But objections were raised to v1.1 on similar "academic" basis as we were trying to find the best compromise. I still find v1.1 a better compromise than trying to specify the options sub-fields - that's why I said "the damage was done"...

smesterheide commented 1 year ago

The way I understand it: adding optional properties to any payload is naturally backwards compatible. Adding (mandatory) sub-fields to a property, which was initially defined as a generic object, thus without any specified structure, is a breaker: the hypothetical implementation that decided to use those sub-fields with potentially different semantics, becomes "v1.next non-compliant" whereas it was "v1.0 compliant".

I would like to highlight the different semantics issue. Otherwise I think it is actually beneficial to synthesize the specification from existing implementations.

From a pure implementation point of view, having a variable schema is significantly more complex. E.g. Golang allows to parse a payload against a given schema, and then decisions are based on values. In a case of multiple different schemas, the code would become much more complex.

This alone is a reason to reject this proposal.

Somewhat related code reference: https://github.com/cs3org/reva/blob/master/internal/http/services/ocmd/shares.go