International-Data-Spaces-Association / ids-specification

The Dataspace Protocol is a set of specifications designed to facilitate interoperable data sharing between entities governed by usage control and based on Web technologies. These specifications define the schemas and protocols required for entities to publish data, negotiate Agreements, and access data in a data space

https://docs.internationaldataspaces.org/dataspace-protocol/

Apache License 2.0

30 stars 14 forks source link

http binding: Handling of `@id` and `processId` in Negotiation and Transfer process (spec text and examples) #106

Closed matgnt closed 11 months ago

matgnt commented 1 year ago

Original title: http binding: Negotiation examples with '@id' are misleading and probably security-wise not best practice

During implementation I stumbled over the @id fields in the http binding

https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/negotiation/contract.negotiation.binding.https.md

urn:uuid:dcbf434c-eacf-4582-9a02-f8dd50120fd3 is used in many places, e.g. the GET on the negotation resource and also the POST on negotiation/request

to my understanding, that means the consumer sets an @id in its request and the provider would take this very same (consumer given id) under which the consumer can fetch the state of the request (from the provider). I think the provider should NOT take over a consumer given id in any case. The provider MUST generate its own resource id.

The @id is the correlation id that will be used for callback messages.

Yes, the callback needs a correlation to the user given id, correct. But never the provider should make a resource available under the consumer given id as I would read the example right now.

Any thoughts? Maybe just a copy/paste issue?

-- Matthias Binzer

jimmarino commented 1 year ago

Yes, as stated, the @id should be used as the correlation id for callback requests. This is necessary to implement reliability guarantees in implementations that choose to do so. The provider is not required to use the correlation id as its internal unique identifier.

matgnt commented 1 year ago

The provider is not required to use the correlation id as its internal unique identifier.

Yes, this is how it should be and the text also describes it that way. But the examples look differently. There, the id from the request (of the consumer) is taken as the id on the provider side, which I think can be very misleading. I would propose to use different ids in 2.5.1 (request and response, including the location header) and 2.4.1

Additionally I think we should make a non-normative statement, probably not in http binding, but in the general part, that:

processId is NOT meant to be like a thread id, but only a reference between a request / response (previous state -> next state) reference.

That's at least how I would interpret it. Is this correct? I couldn't find the place (in negotiation) how the usage of processId is described. I can find multiple statements like:

A XXMessage must contain a processId.

But no details about what it should be exactly.

matgnt commented 1 year ago

processId is NOT meant to be like a thread id, but only a reference between a request / response (previous state -> next state) reference.

That's at least how I would interpret it. Is this correct? I couldn't find the place (in negotiation) how the usage of processId is described. I can find multiple statements like:

ok, seems I misunderstood this part. It now sounds to me more like an id that stays the same for the entire 'process' / (thread).

The contract negotiation unique id.

It is also part of the example here: https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/negotiation/message/contract-negotiation.json

but unfortunately not part of the example here (that I used to understand the process...): https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/negotiation/contract.negotiation.binding.https.md#241-get

I think this caused the confusion. I only figured out because how processId is used in the transfer process.

I think we should:

improve the description from: The contract negotiation unique id to: The contract negotiation unique id that stays the same for the remaining negotiation process. Optionally we could even extend the information to: It is generated by the consumer as the @id field in the request and will be taken over by the provider as the processId for further reference.
and remove it from the ContractNegotiationError section to a more general section https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/negotiation/contract.negotiation.binding.https.md#22-contract-negotiation-error
and update the example in the http binding to contain this very relevant piece of information (processId) https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/negotiation/contract.negotiation.binding.https.md#241-get

and from the original issue:

update the http binding examples to NOT reuse the same message @id fields in 2.4.1 and 2.5.1

matgnt commented 1 year ago

We need to discuss this again, because it turns out, EDC does use the processId also for the path param, e.g. /negotiations/{processId}/agreement where it expects the callbacks and NOT the @id from the ContractRequestmessage from the consumer.

I think whatever is correct, we need to explain this in more detail.

matgnt commented 1 year ago

Meeting summary: needs a followup with: @jimmarino @juliapampus @ndr_brt @matgnt

matgnt commented 1 year ago

Similar in the transfer process. Here the spec is very clear, but EDC implements it with another interpretation (against the spec in my opinion)

See https://github.com/eclipse-edc/Connector/issues/3253#issuecomment-1619628738

matgnt commented 1 year ago

After a discussion (in a smaller group of @jimmarino @ndr-brt ) we agreed on the following:

processId will be changed into consumerId and providerId.
http Location header with an id will be removed
all provider endpoints using :id will use the providerId
Initial RequestMessages will contain only the consumerId
All follow-up POST messages will contain only 1 of the ids, depending where the message is sent to, e.g. messages to the consumer only contain the consumerId

This needs to go through a bigger group to be confirmed, maybe @juliapampus @sebbader-sap @mkollenstart if you could have a look at this.

ssteinbuss commented 1 year ago

@sebbader-sap asks for clearer names for consumerId and providerId

e.g. consumerProcessId and providerProcessId

mkollenstart commented 1 year ago

From an implementation/validation perspective, I would opt for including both the provider and consumer IDs in the subsequent messages. Since validation logic often doesn't contain the direction the message is sent to, resulting in generic rules that one of both should be present while in fact that can be wrong.

I do agree with clearer names like consumerProcessId and providerProcessId

juliapampus commented 1 year ago

What about consumerPid and providerPid?

matgnt commented 1 year ago

summary from the meeting today: open questions:

use consumerPid instead of providerPid
how about http binding. is the field still required even though it is transferred via path? what if the 2 ids are different?
should we include both ids in all messages (when available, meaning not in the start messages...)

ndr-brt commented 1 year ago

use consumerPid instead of providerPid

what does this mean? should they both exist right?

how about http binding. is the field still required even though it is transferred via path? what if the 2 ids are different?

I'd say that it should exist also in the request body and it should match with the path one (as it should be done with the @type field)

should we include both ids in all messages (when available, meaning not in the start messages...)

I'd vote for a yes.

juliapampus commented 12 months ago

Decision Proposal

Simplify transfer and negotiation processes.

General

processId will be changed into consumerPid and providerPid.
http Location header with an id will be removed
all provider endpoints using :id will use the providerPid
all consumer endpoints using :id will use the consumerPid
initial messages will contain only the consumerPid (or providerPid)

Option 1

see comment

All follow-up messages will contain only 1 of the IDs, depending where the message is sent to, e.g., messages to the consumer only contain the consumerPid

Example:

POST https://connector.consumer.com/negotiations/:consumerPid/agreement

Authorization: ...

{
  "@context": "https://w3id.org/dspace/v0.8/context.json",
  "@type": "dspace:ContractAgreementMessage",
  "dspace:consumerPid": "urn:uuid:a343fcbf-99fc-4ce8-8e9b-148c97605aab",
  "dspace:agreement": {
    "@type": "odrl:Agreement",
    "@id": "e8dc8655-44c2-46ef-b701-4cffdc2faa44",
    "dspace:consumerId": "...",
    "dspace:providerId": "...",
    }
  }
}

Option 2

see comment

All follow-up messages contain the providerPid and consumerPid (except the initial messages)

Example:

POST https://connector.consumer.com/negotiations/:consumerPid/agreement

Authorization: ...

{
  "@context": "https://w3id.org/dspace/v0.8/context.json",
  "@type": "dspace:ContractAgreementMessage",
  "dspace:consumerPid": "urn:uuid:a343fcbf-99fc-4ce8-8e9b-148c97605aab",
  "dspace:providerPid": "urn:uuid:a343fcbf-99fc-4ce8-8e9b-148c57398-abc",
  "dspace:agreement": {
    "@type": "odrl:Agreement",
    "@id": "e8dc8655-44c2-46ef-b701-4cffdc2faa44",
    "dspace:consumerId": "...",
    "dspace:providerId": "...",
    }
  }
}

juliapampus commented 11 months ago

decided on 31.08.2023 to implement option 2

juliapampus commented 11 months ago

Done with #151 and #152