Open rafaelmag110 opened 1 month ago
Contributor @rafaelmag110 is currently unavailable and thus can't check the architecture criteria, but we guarantee that they are fulfilled.
I have some questions on this one:
I have some questions on this one:
- Is this an extension of S3/Azure PUSH or a new transfer mechanism?
- You state, that this is a breaking change, but why? I see that this opens up new options, but nothing currently existing would stop working, right? Would it change the way, you offer contracts with current functionality?
- How would a data offer look like. I would assume, that currently, you transfer a single blob or a folder of blobs. That is a defined set which can be referenced by name. What is the idea to reference the non-existing data that is offered by the contract offering?
Hello @lgblaumeiser
@matbmoser proposed to rename the title
@ClosedSourcerer also interested in this feature
Maybe can be renamed to: "Design and enable the EDC Dataplane steam functionality" I would like to be involved in the discussions and in the design for the implementation.
And we can speak in portuguese @rafaelmag110 ;)
Maybe can be renamed to: "Design and enable the EDC Dataplane steam functionality" I would like to be involved in the discussions and in the design for the implementation.
And we can speak in portuguese @rafaelmag110 ;)
As far as I understand it "stream" does not really capture that meaning of what is proposed. Streaming would indicate a continous data transfer.
What is proposed is allowing the whole asset and policy negotiation process to allow for "true PUSH".
Currently you can only do a "fake PUSH" by having the data recipient register an API endpoint as a data asset, meaning the data owner negotiates with the data recipient for the right to send the data. This does not really match the idea behind the whole negotiation process, because that process is about data sovereignty. If the data owner and thus sovereign wants to send you data there is simply no negotiation required. The act of sending the data, is the permission iteslf.
@matbmoser what @ClosedSourcerer says is right. It's not really about streaming in the sense of Pub-Sub, but it's rather about keeping a channel open and to allow the owner of the data to also be the one that creates the contract definition.
It will still require the provider to select the right agreement and use the right token to send data to a consumer each and every time so it's not an automatic pub-sub pattern. This would require a lot of additional logic that - in the first step - we won't implement.
Furthermore, we want to focus on the HTTP transfer type via EDR first. We could of course extend that to e.g. S3 or AzureBlob Type transfers as well but that - again will be more complex - as we have to think about refreshing access tokens to the buckets or blobs etc.
Overview
Explain the topic in 2 sentences
A Dataspace data consumer should be able to subscribe to a data offering that provides a stream of non-finite data transfers while the agreed contract terms are valid.
What's the benefit?
We reduce the complexity of setting up use cases that rely on frequent, provider triggered, transfers of data.
What are the Risks/Dependencies ?
Its a breaking change and should allow use cases to be redesigned in a different, more efficient way.
Detailed explanation
Current implementation
Considering the given scenario.
Company A generates GBs of data on daily basis. Company B wants to consume that data as soon as it is generated.
Currently, Company B has to create a data offering that allows Company A to notify B when a new data was generated and where it can be found. This requires a protocol to be agreed between both companies, so the details of said notification and following steps are understood.
Once company B receives the notification, it triggers a search, negotiation and transfer of that particular data.
This is designed as such because the EDC closes the forward channel on any provider push transfer scenario, as soon as the transfer is either successful / failed.
Proposed improvements
This shall introduce a concept for a dataplane that is capable of keeping the forward channel open, and of handling and recovering from data transfer errors, while the agreed contract terms are valid.
Feature Team
Contributor
Committer
User Stories
Acceptance Criteria
Test Cases
Test Case 1
A Data Consumer can subscribe to a data offer
Steps
Given a consumer and a provider exist. Given a provider has a dataset which we can add new data to. Given the provider has an offer for this dataset
Expected Result
Test Case 2
A Data Consumer can receive data updates from his subscription, when the data provider adds new data to the source
Steps
Given a consumer is subscribed to a data offer from a provider.
Expected Result
Architectural Relevance
The following items are ensured (answer: yes) after this issue is implemented:
Justification: (Fill this out, if at least one of the checkboxes above cannot be ticked. Contact the Architecture Management Committee to get an approval for the justification)
Additional information