Open anmolsgandhi opened 6 months ago
Following the numeration above:
UPDATED BASED ON COMMENTS To implement the design I will open multiple pull requests (list might vary)
Done
Pre-reqs: Refactor current datasets into S3 Datasets and Base datasets (#1123)In-Progress
Pre-reqs: Refactor current dataset sharing into S3 sharing and base sharing (#1283)In-Progress
New Redshift Dataset module using Base datasets + publish to catalog logic. Introduce Redshift ConnectionsNot Started
New Redshift data sharing module using base sharing@dlpzx I've read through the design and watched your video as well (it was very helpful as it answered some of my questions).
Overall I don't see any big problems but I do have some concerns.
1) Addition of a new UI "Warehouses" to manage Redshift connections.I find this UI a bit awkward. My first instinct that this should be a TAB under an environment and not a separate UI outside an environment. Especially because you cannot have a connection that is not part of an environment. I think this would also simplify creating connections because then the environment is already pre-defined and you can also make the connection be owned by the same team that is creating the connection.
I would also want to make sure that there's a consistent user experience when registering consumer roles or redshift consumer connections. Even today I find it weird that we register consumer roles in "Teams" tab under environments. I don't think that's intuitive. Perhaps with the addition of redshift connections we can instead add a new tab on the environment "Consumer Connections" or smth similar where you can manage your consumer IAM roles and redshift consumer connections etc..
Also I don't really feel that this new type "Warehouses" is actually going to be reusable for anything else other than Redshift so I think it's misleading.
I would like to hear your arguments why you think it would be much better to put this as a new UI on the left main bar vs making it a new tab on the environment.
2) For sure make Redshift modular so that it can be fully disabled as for example we don't use redshift at all and don't want our users to be confused.
3) We need to check security. Absolutely make sure to scan all infrastructure with checkov and that the permissions are as tight as possible.
4) I'd really like to see part 2 of your video to understand better how Redshift consumer connections should work.
Thank you!
I really like how descriptive the design is. Answered most of my questions too! I have a few pending though:
Thanks @zsaltys and @anushka-singh for the input, you went straight to the tricky points.
DESIGN UPDATED WITH THE FEEDBACK!
Description:
Enable seamless data integration with Redshift as a new data source in ‘data.all’. This feature enhances collaboration by allowing users to easily publish, discover and share Redshift data within the data.all platform. Users can securely configure Redshift instance, streamlining the process of making Redshift datasets accessible.
Details:
Adding Redshift Instance and Publishing Tables
Tables Available for Discovery
Self-service Share Process for Redshift Data Sharing
Benefits:
@dlpzx