data-dot-all / dataall

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
https://data-dot-all.github.io/dataall/
Apache License 2.0
229 stars 82 forks source link

Third-party data integration with AWS Data Exchange (public datasets, commercial datasets) #316

Open yegortokmakov opened 1 year ago

yegortokmakov commented 1 year ago

Is your feature request related to a problem? Please describe. Large number of customers, especially those doing scientific research, rely on third-party data to augument their internal data. Right now, to ingest those data products, users need to first land data in their AWS environment and then register it in Dataall. It would save a lot of work and make it much easier to work with 3rd party data if AWS Data Exchange catalog is integrated with Data.all. Data Exchange hosts 3500+ datasets, both free and commercial, and is well integrated with AWS ecosystem.

Many customers, especially those doing scientific research, rely on third-party data products to augment their internal data. Currently, to ingest these data products, users must first land the data in their AWS environment and then register it in Dataall. Integrating the AWS Data Exchange catalog with Dataall would save a lot of work and make it much easier to work with third-party data. Data Exchange hosts over 3,500 datasets, both free and commercial, and is well integrated with the AWS ecosystem.

Describe the solution you'd like Dataall includes AWS Data Exchange datasets in its catalog, so end users can access Data Exchange products without leaving Dataall.

Describe alternatives you've considered There are currently two options:

Additional context I believe a good start would be public datasets from AWS Data Exchange for Data Files or even AWS Data Exchange for Amazon S3 (Preview) AWS Data Exchange for Amazon S3

dlpzx commented 1 year ago

Hi @yegortokmakov, completely agree! This is a very cool feature indeed. In fact we opened a similar one in #301. Right now we are focused on other development tasks, but with growing interest the chances for it to be implemented grow as well. I have added it to the Q2 2023 Project so that we consider it when we decide the final features to be implemented. If you are implementing it on your own, we are happy to contribute and review :)

chamiles commented 1 year ago

+1 for customer requesting ability to publish datasets to data exchange.