International-Data-Spaces-Association / ids-specification

The Dataspace Protocol is a set of specifications designed to facilitate interoperable data sharing between entities governed by usage control and based on Web technologies. These specifications define the schemas and protocols required for entities to publish data, negotiate Agreements, and access data in a data space
https://docs.internationaldataspaces.org/dataspace-protocol/
Apache License 2.0
26 stars 14 forks source link

IDs in Transfer Process clarification #113

Closed PeterKoen-MSFT closed 10 months ago

PeterKoen-MSFT commented 1 year ago

the samples in the Transfer Process specification ([https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/transfer/transfer.process.binding.https.md]https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/transfer/transfer.process.binding.https.md) are confusing as the processID in the Get message is the same as the @id in the Post message. The samples need to be checked for id consistency/copy'n'paste errors to be better understandable. If possible add an explanation about the IDs in the explanation before/after the sample.

jimmarino commented 1 year ago

OK. The specs are correct. Let's walk through it.

The consumer sends the following message:

{
 "@context":  "https://w3id.org/dspace/v0.8/context.json",
 "@id": "urn:uuid:4a3ad65e-d78a-4200-a666-fc47aec32f2f",
 "@type": "dspace:TransferRequestMessage",
 "dspace:agreementId": "urn:uuid:e8dc8655-44c2-46ef-b701-4cffdc2faa44",
 "dct:format": "dspace:s3+push",
 "dataAddress": {},
 "dspace:callbackAddress": "https://......"
}

The @id is the message id sent by the consumer and is used to correlate all future messages for that transfer process. If the consumer needs to resend the message (i.e. there is a failure before the ack is received from the provider), it must resend the message with the same @id value. This value is then used by the consumer and provider for all subsequent communication and is referred to as the processId.

In order to implement reliability, a consumer would typically generate the id and commit it transactionally to a persistent store before sending the initial message. This will allow the consumer to resend the request until the provider has acked back or a failure threshold is reached. The provider would typically set the @id to a correlation id for the transfer process and use its own generated id to refere to the process internally. When calling back the consumer, the provider would use this correlation id and set it to the processId. This will allow the consumer to perform de-deplication as well.

For example, consider the following scenario:

  1. Consumer commits its id and sends a request to the provider.
  2. The provider commits the request and prepares it for processing.
  3. The network fails before the provider can ack the consumer for the initial request
  4. The consumer resends the request.
  5. The provider receives the request, performs de-deplication and determines it has already received it, and acks back
  6. Processing proceeeds.

Let me know if this answers the question.

jimmarino commented 1 year ago

I should add that originally we had seprated the consumer process id from the provider process id at the spec level and distinguished the consumer id as the correlation id. In the course of implementing this in the EDC, we realized the two ids did not need to be distinguished at the spec level, which simplifies the message definitions.

juliapampus commented 10 months ago

Addressed with #106