Closed pascalwhoop closed 2 years ago
Hi @pascalwhoop, at the moment data must be stored in S3. A data.all Dataset consists of both S3 bucket + Glue database. We use S3 because is the center-piece of AWS data lakes connecting multiple data sources:
For the specific case of RDS, you could implement something like the step 3 of this blog: https://aws.amazon.com/blogs/big-data/integrating-aws-lake-formation-with-amazon-rds-for-sql-server/. The other steps would be handled by data.all creation or import of datasets.
But, we are open to direct storage in other data sources. it is definitely something that raises interest. So, to the question: "what needs to happen to tap into this Glue abstraction in Data.All?"
Assumptions:
User experience: As a data.all user with access to the RDS environment account and the RDS DB
Implementation:
Keep in mind that those are drafted steps and could of course, change. Maybe you can define the assumptions and user experience and we can refine the implementation steps.
Hi @dlpzx thx for paying attention to this. I think your user experience is pretty spot-on.
| As a use case owner with an RDS hosted in my account, I want to expose some of my tables as data products to other teams easily
Particularly the experience would be
This is a separate topic but I think it may also be a good UX if one gets a "waiting screen" within the same journey that lets me wait for the data.all objects instead of breaking the journey into 1) create dataset 2) add data to it.
Assumptions wise, I am not deep enough in the tech to understand what the RDS needs to be configured like for the glue crawler to be able to access the DB. I suppose there is some form of credentials handling that needs to happen?
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Is it possible that we can leverage any of the sources defined by Glue? https://docs.aws.amazon.com/glue/latest/dg/populate-data-catalog.html
I.e. can we make an RDS DB Table available in the catalog through Glue? If not, what needs to happen to tap into this Glue abstraction in Data.All?