eclipse-tractusx / sig-release

https://eclipse-tractusx.github.io/sig-release
Apache License 2.0
9 stars 10 forks source link

Concept for Dataplane that supports Non-Finite Provider Push Transfers #938

Open rafaelmag110 opened 1 month ago

rafaelmag110 commented 1 month ago

Overview

Explain the topic in 2 sentences

A Dataspace data consumer should be able to subscribe to a data offering that provides a stream of non-finite data transfers while the agreed contract terms are valid.

What's the benefit?

We reduce the complexity of setting up use cases that rely on frequent, provider triggered, transfers of data.

What are the Risks/Dependencies ?

Its a breaking change and should allow use cases to be redesigned in a different, more efficient way.

Detailed explanation

Current implementation

Considering the given scenario.

Company A generates GBs of data on daily basis. Company B wants to consume that data as soon as it is generated.

Currently, Company B has to create a data offering that allows Company A to notify B when a new data was generated and where it can be found. This requires a protocol to be agreed between both companies, so the details of said notification and following steps are understood.

Once company B receives the notification, it triggers a search, negotiation and transfer of that particular data.

This is designed as such because the EDC closes the forward channel on any provider push transfer scenario, as soon as the transfer is either successful / failed.

Proposed improvements

This shall introduce a concept for a dataplane that is capable of keeping the forward channel open, and of handling and recovering from data transfer errors, while the agreed contract terms are valid.

Feature Team

Contributor

Committer

User Stories

Acceptance Criteria

Test Cases

Test Case 1

A Data Consumer can subscribe to a data offer

Steps

Given a consumer and a provider exist. Given a provider has a dataset which we can add new data to. Given the provider has an offer for this dataset

  1. The consumer finds, negotiates access to the providers dataset.
  2. Using the obtained agreement, the consumer initiates a provider push data transfer for the respective dataset.

Expected Result

  1. The data transfer stays active as long as the terms defined in the agreed contract are valid.

Test Case 2

A Data Consumer can receive data updates from his subscription, when the data provider adds new data to the source

Steps

Given a consumer is subscribed to a data offer from a provider.

  1. A data provider add new data to its data source.

Expected Result

  1. The consumer eventually receives the data in its data destination

Architectural Relevance

The following items are ensured (answer: yes) after this issue is implemented:

Justification: (Fill this out, if at least one of the checkboxes above cannot be ticked. Contact the Architecture Management Committee to get an approval for the justification)

Additional information

gerbigf commented 2 weeks ago

Contributor @rafaelmag110 is currently unavailable and thus can't check the architecture criteria, but we guarantee that they are fulfilled.

lgblaumeiser commented 2 weeks ago

I have some questions on this one:

correiaafonso12 commented 1 week ago

I have some questions on this one:

  • Is this an extension of S3/Azure PUSH or a new transfer mechanism?
  • You state, that this is a breaking change, but why? I see that this opens up new options, but nothing currently existing would stop working, right? Would it change the way, you offer contracts with current functionality?
  • How would a data offer look like. I would assume, that currently, you transfer a single blob or a folder of blobs. That is a defined set which can be referenced by name. What is the idea to reference the non-existing data that is offered by the contract offering?

Hello @lgblaumeiser

stephanbcbauer commented 1 week ago

@matbmoser proposed to rename the title

stephanbcbauer commented 1 week ago

@ClosedSourcerer also interested in this feature

matbmoser commented 1 week ago

Maybe can be renamed to: "Design and enable the EDC Dataplane steam functionality" I would like to be involved in the discussions and in the design for the implementation.

And we can speak in portuguese @rafaelmag110 ;)

ClosedSourcerer commented 1 week ago

Maybe can be renamed to: "Design and enable the EDC Dataplane steam functionality" I would like to be involved in the discussions and in the design for the implementation.

And we can speak in portuguese @rafaelmag110 ;)

As far as I understand it "stream" does not really capture that meaning of what is proposed. Streaming would indicate a continous data transfer.

What is proposed is allowing the whole asset and policy negotiation process to allow for "true PUSH".

Currently you can only do a "fake PUSH" by having the data recipient register an API endpoint as a data asset, meaning the data owner negotiates with the data recipient for the right to send the data. This does not really match the idea behind the whole negotiation process, because that process is about data sovereignty. If the data owner and thus sovereign wants to send you data there is simply no negotiation required. The act of sending the data, is the permission iteslf.

gerbigf commented 1 week ago

@matbmoser what @ClosedSourcerer says is right. It's not really about streaming in the sense of Pub-Sub, but it's rather about keeping a channel open and to allow the owner of the data to also be the one that creates the contract definition.

It will still require the provider to select the right agreement and use the right token to send data to a consumer each and every time so it's not an automatic pub-sub pattern. This would require a lot of additional logic that - in the first step - we won't implement.

Furthermore, we want to focus on the HTTP transfer type via EDR first. We could of course extend that to e.g. S3 or AzureBlob Type transfers as well but that - again will be more complex - as we have to think about refreshing access tokens to the buckets or blobs etc.