International-Data-Spaces-Association / ids-specification

The Dataspace Protocol is a set of specifications designed to facilitate interoperable data sharing between entities governed by usage control and based on Web technologies. These specifications define the schemas and protocols required for entities to publish data, negotiate Agreements, and access data in a data space
https://docs.internationaldataspaces.org/dataspace-protocol/
Apache License 2.0
26 stars 14 forks source link

Catalog: Pagination and Filter making it hard to reach compatibility in practice #194

Closed matgnt closed 4 months ago

matgnt commented 5 months ago

Reading the https binding Catalog part again to check with an implementation, it feels like there is some parts very openly formulated.

Pagination https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/catalog/catalog.binding.https.md#33-pagination It says may paginate the result I understood this is optional. In practice that means realistic number of datasets might be an issue because of huge transfers. Also IF I implement it, it is not clear from the referenced WebLink document that the terms page and per_page is coming from there. Is this normative coming from the DSP spec? If yes, I think we need to spend a separate section to define those terms instead of only having them in the example.

Filter Also very similar with filters. It says: The filter property is optional. If present, the filter property can contain an implementation-specific filter expression or query to be executed as part of the [Catalog](https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/model/terminology.md#catalog) request. I think this is very dangerous for compatibility. implementation specific is bad. How could they ever be compatible? Implementation should not specify this. It could be defined by a Dataspace.

Summing this up, I asked myself whether the catalog/request should be an optional (MAY) endpoint, because for practical usage it depends on further definitions?

Any thoughts?

-- Matthias Binzer

matgnt commented 5 months ago

Thinking about the Pagination again, I think it should be fine if we assume that the Consumer is not supposed to change the content of that Link and only uses it as is.

That means, only the Provider needs to understand the content of it. In consequence, the Consumer can not define how many items they want per page. (I think this could be worth a Note section)

matgnt commented 5 months ago

Discussed in the meeting: Pagination: should become normative instead of only 'technical considerations'. Clarify whether 'technical considerations' is normative or not. Minimum is to state, that the links are not meant to be changed and purely provider given. Also examples should be changed to point to existing endpoints. In case query params are used, those are not spec relevant, but pure implementation decision. Decide what to support from WebLinks: "next" and "previous" and also note this in the spec.

matgnt commented 5 months ago

Discussed in the meeting: Filtering: Filtering is optional (as already stated in the http binding). We need error codes in case filters are not implemented (4xx) or specific filter expressions are not supported (also 4xx or 5xx)

sebbader-sap commented 5 months ago

General opinion today: Result for not-implemented filters shall be Bad Request (400).

matgnt commented 5 months ago

Regarding pagination we agreed in today's meeting to change the link example to something non-readable query params.

Proposal Old:

Link: <https://provider.com/catalog?page=1&per_page=100>; rel="previous"
Link: <https://provider.com/catalog?page=3&per_page=100>; rel="next"

should be new:

Link: <https://provider.com/catalog?continuationToken=f59892315ac44de8ab4bdc9014502d52>; rel="previous"
Link: <https://provider.com/catalog?continuationToken=d31ded1bf6f4415981a23bb600681290>; rel="next"

Reason: Readers otherwise might assume they can change e.g. 'per_page', but this is purely Provider given structure and not specified here or in Web Link RFC.

Hint: Link content is non-normative.

We additionally need to add normative text:

Only 'previous' and 'next' relation types ('rel') are mandatory to be supported.

Reason: The list of possible such relation types is longer https://datatracker.ietf.org/doc/html/rfc5988#section-6.2.2

Non normative hint: Absence of 'previous' or 'next' links indicates the end into one or the other direction.

@jimmarino anything you would add?

matgnt commented 5 months ago

Regarding filtering, I think also the section here needs some rework: https://github.com/International-Data-Spaces-Association/ids-specification/blob/4b26a3ed4f50c7867d8b4f2c359adc8b0f1057f5/catalog/catalog.protocol.md?plain=1#L169

The text is non-normative and very implementation specific details and recommendations. e.g. that the Consumer should crawl and filter results on client side.

jimmarino commented 4 months ago

Regarding filtering, I think also the section here needs some rework:

https://github.com/International-Data-Spaces-Association/ids-specification/blob/4b26a3ed4f50c7867d8b4f2c359adc8b0f1057f5/catalog/catalog.protocol.md?plain=1#L169

The text is non-normative and very implementation specific details and recommendations. e.g. that the Consumer should crawl and filter results on client side.

I think the Technical Considerations section is valuable. It does not offer implementation-specific suggestions but lays out reasoning for why the protocol is designed in a particular way.

In order to keep this work actionable and time-limited, I will make the following changes in a PR:

  1. Adjust pagination to be normative and only require "next" and "previous" relations.
  2. Change the pagination examples.
  3. Add Result 400 Bad Request.