Open 10d9e opened 1 year ago
This is how I would design an ingestion layer that uses different cloud storage provider adapters
This won't allow us to adhere to full standards ... the flow is more complicated than this if we are truly going to support S3. Especially as our customer list diversifies. The S3 protocol is very detailed and requires auxiliary storage to maintain customer specific data outside of just the core objects.
If S3 is our desire this will require some level of detailed planning ... especially when we get into S3 policy management and verb support .. even like head_object calls are important but this flow needs to account for the handling of this meta and also ACL/bucket mapping and other things. How standardized do we need to be? If it is even 50% that is a significant lift.
Do we want a more extensive diagram? My thought was just a simplified version / High-level diagram that conceptualizes the integration.
lmk when this is good to go and I can start chugging <3
I'll put something together to highlight my thoughts then we can delegate what we wish to support/attack straightaway in v1.
The S3 problem is not pulling from S3, its providing an S3 compliant endpoint where we can receive data directly. I am nearly done with an S3 compliant connector for Ptolemy, this problem we are discussing is a completely different animal.
Thanks. Interesting so this adapter is meant to replace the client library that they are using to upload from s3.. so same endpoints and same post body expectation so the dev UX will be the same.
per @alimbuyuguen - the latest image:
There is one small issue I just caught in the diagram ... should say metadata going to the DB .. we are not persisting objects there.
We would like to have S3 integration added to our overall architecture diagrams. Should demonstrate how Amazon/Azure/GCP S3 storage fits into the Delta stack.
cc: @schreck23 @jimmylee