eclipse-tractusx / sig-release

https://eclipse-tractusx.github.io/sig-release
Apache License 2.0
9 stars 10 forks source link

EDC - Transfer of data between S3 buckets or between Azure Blob stores #755

Open lgblaumeiser opened 4 months ago

lgblaumeiser commented 4 months ago

Overview

Explain the topic in 2 sentences

Currently, data transfer, even for big data goes through the connector data plane. For transmissions between two S3 Buckets, this is not efficient and especially not cost efficient, especially if the transfer is about TB or even PB. The issue is about optimizing that by using means of S3 to directly transfer the data.

What's the benefit?

Less cost for data transfer between S3 buckets

What are the Risks/Dependencies ?

No risks

Depends on features within AWS to transfer data within the AWS cloud

Detailed explanation

Current implementation

All data is processed through the data plane, which means in case of a Azure based deployment for AWS S3 data transfer, the data is transfered from AWS to Azure back to AWS which is extremely expensive.

Proposed improvements

The targeted solution provides direct usage of mechanisms within AWS S3 that allow to transfer data. The dataplane implementation will in case of a suited transfer request make use of the mechanism so that the data does not flow through the dataplane but is exchanged directly within S3.

Feature Team

Contributor

Committer

User Stories

Acceptance Criteria

Test Cases

Test Case 1

Run a S§ data transfer within the same region between two different companies which provide the S3 bucket that on the provider side stores the data to be transfered and on the consumer side provides the place to put the data.

Steps

  1. Create an asset that offers a data item in a S3 bucket on the provider side
  2. Negotiate a contract initiated from the consumer to push the data item to the consumer bucket
  3. Initiate the data transfer after the contract is established

Expected Result

  1. The data is transfered from the provider S3 bucket to the consumer S3 bucket
  2. The data transfer was done within the AWS S3 system without data transfer to the outside

Architectural Relevance

The following items are ensured (answer: yes) after this issue is implemented:

Justification: (Fill this out, if at least one of the checkboxes above cannot be ticked. Contact the Architecture Management Committee to get an approval for the justification) n/a

Additional information

stephanbcbauer commented 4 months ago

Presented in the DRAFT Feature Freeze -> Committer is available

rafaelmag110 commented 4 months ago

This is a good idea. There is a similar discussion open in one of the upstream repositories: https://github.com/eclipse-edc/Technology-Aws/discussions/270

We wanted to contribute with this improvement but couldn't crack a particular problem of permission management. I'll take some time to add more details to that discussion to kick start it again.

lgblaumeiser commented 2 months ago

PoC work ongoing, a final implementation will not make it into this release

lgblaumeiser commented 1 month ago

Spill over from 24/12 release planning