International-Data-Spaces-Association / ids-specification

The Dataspace Protocol is a set of specifications designed to facilitate interoperable data sharing between entities governed by usage control and based on Web technologies. These specifications define the schemas and protocols required for entities to publish data, negotiate Agreements, and access data in a data space
https://docs.internationaldataspaces.org/dataspace-protocol/
Apache License 2.0
30 stars 14 forks source link

DataAddressMessage is not described anywhere #107

Closed matgnt closed 10 months ago

matgnt commented 1 year ago

While implementing the transfer process, I figured out that DataAddressMessage is not described anywhere, but mentioned in

https://github.com/International-Data-Spaces-Association/ids-specification/blob/main/transfer/transfer.process.protocol.md#1-transferrequestmessage

At least my grep did not reveal any match...

matgnt commented 1 year ago

Also I think the content of transfer process dataAddress is very vague. It is described, that it could contain transfer specific endpoint and token, but I think it is not clear in the first place, that this is NOT defined as part of the DSP protocol and MUST be defined by the dataspace itself. Is this assumption correct? And if so, I think @juliapampus this is one item for what we discussed what needs to be done / defined by the dataspace in addition to the DSP protocol itself. I think #104 is meant as a summary for those topics, right?

-- Matthias Binzer

matgnt commented 1 year ago

Meeting summary:

matgnt commented 1 year ago

This becomes important I think. We need to specify the bare minimum that needs to be available that a Consumer can use this and forward it in it's (authorization) header. Currently, EDC basically stops processing if it the dataAddress is not in EDC format. And the EDC format of the dataAddress is NOT standardized.

matgnt commented 1 year ago

dataAddress right now is a string, but should be a complex type. EDC seems to use the following structure:

{
            "@type": "edc:DataAddress",
            "edc:type": "EDR",
            "edc:authCode": "eyJhbGciOiJSUzI1NiJ9.XXX.XXX",
            "edc:endpoint": "http://provider-data-plane:9192/public",
            "edc:id": "816bb2fc-c14d-4d98-a20d-1520682b7e28",
            "edc:authKey": "Authorization"
        }

Not sure what is the edc:id here, but the rest looks kind of reasonable. I would make a few changes if possible:

Let's use this as a starting point for the meeting tomorrow.

--

Matthias Binzer

matgnt commented 1 year ago

From the meeting today:

mkollenstart commented 1 year ago

My suggestion would be to use something like, which allows dynamic endpoint properties:

{
  "@context": "https://w3id.org/dspace/v0.8/context.json",
  "@type": "dspace:TransferStartMessage",
  "dspace:processId": "...",
  "dspace:dataAddress": {
    "@type": "dspace:DataAddress",
    "dspace:endpointType": "HTTP",
    "dspace:endpoint": "http://example.com",
    "dspace:endpointProperties": [{
      "@type": "dspace:EndpointProperty",
      "dspace:name": "Authorization",
      "dspace:value": "Bearer TOKEN-ABCDEFG"
    }]
  }
}

The dspace:endpointType could perhaps be an IRI, since this should point towards the documentation of that data plane type (with information on the protocol and which properties should be used).

jimmarino commented 1 year ago

If dspace:endpointType is an IRI, it will be expanded and need to have a vocabulary prefix. Otherwise, if it does not have a prefix, it will be expanded using the default vocabulary. Something we should consider as it is not user-friendly.

matgnt commented 1 year ago

Are there use cases we might have multiple such properties since this is a list now in the proposal?

"dspace:endpointProperties": [{

mkollenstart commented 1 year ago

If dspace:endpointType is an IRI, it will be expanded and need to have a vocabulary prefix. Otherwise, if it does not have a prefix, it will be expanded using the default vocabulary. Something we should consider as it is not user-friendly.

I agree, my preference would be to use an IRI, in this case for instance dspace:HTTP. As this would provide more clarity and is more aligned with JSON-LD. The only problem I see with that is whether we expect this IRI to be defined/resolvable. And what would happen if a new endpoint type comes into play, can someone just make up an endpoint type under the dspace namespace? If other namespaces are used, you'll have to always check the expanded form inside implementations.

Are there use cases we might have multiple such properties since this is a list now in the proposal?

"dspace:endpointProperties": [{

Yes I think there will be, especially in cases where a single token is not sufficient. When looking, for instance, at Kafka, which uses SASL, you would need a principal and password (or token, etc.). Or when you might want to share additional information not directly related to authentication, e.g. the certificate of the service in case it doesn't have a widely accepted certificate.

matgnt commented 1 year ago

Thanks @mkollenstart , the list sound reasonable to me now. Only open question for today's meeting is the dspace:endpointType Do we have other similar things in DSP already?

matgnt commented 1 year ago

summary of the meeting today:

TODO:

mkollenstart commented 11 months ago

Proposal

Since no well-defined lists of endpoint types is available and having IRIs as endpoint types reduce the possibility of two connectors misunderstanding eachother, a separate namespace (like dspace-types) is the right way to go for common and most used endpoint types. While still allowing endpoint types outside of this namespace for custom or domain-specific endpoint types.

This would result in the following example:

{
  "@context": "https://w3id.org/dspace/v0.8/context.json",
  "@type": "dspace:TransferStartMessage",
  "dspace:consumerPid": "...",
  "dspace:dataAddress": {
    "@type": "dspace:DataAddress",
    "dspace:endpointType": "dspace-types:HTTP",
    "dspace:endpoint": "http://example.com",
    "dspace:endpointProperties": [{
      "@type": "dspace:EndpointProperty",
      "dspace:name": "Authorization",
      "dspace:value": "Bearer TOKEN-ABCDEFG"
    }]
  }
}

Other generic endpoint types that would fit into dspace-types could be: Websocket, MQTT, gRPC, Kafka. These types should be documented to ensure compatibility between data planes implementing these types.

Non-generic endpoint types could be provided with their own namespace or in expanded IRI form (e.g. https://example.com/endpointType/MultiPartyComputation). In the best practices document the recommendation can be given that such custom type should link to documentation on this type, optimally in the same form as the generic endpoint types.

Notes

The location and format of the endpoint types namespace should be discussed in a separate issue. Until the dspace-types namespace is available, the dspace namespace could be used. Since interoperability has to be done per type and implementation as documentation for those is not yet available.

sebbader-sap commented 11 months ago

Just for your interest: https://github.com/admin-shell-io/questions-and-answers/blob/4f74c9a4903233ac14e431614f5e1f6f8c14ee4b/README.md#id47

sebbader-sap commented 10 months ago

Regarding endpointType, the Thing Description (https://www.w3.org/TR/wot-thing-description11/) might also have a proposal.

sebbader-sap commented 10 months ago

Independent of the references linked above, and to be able to continue, we decided in the working group call to just use an URI in an external namespace, e.g., https://w3id.org/idsa/v4.1/HTTP .

  1. This URI is for non-normative for the sake of the protocol.
  2. The thereby used idsa namespace is not included in the official protocol JSON-LD context.