astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
597 stars 152 forks source link

Support loading manifest from S3/GCS/Azure BlobStorage #448

Closed tatiana closed 1 month ago

tatiana commented 1 year ago

While talking to one of the community members, Edgaras Navickas, he mentioned it would be great if users could reference a manifest in an S3 bucket. This was a follow-up to issues reported in the slack thread.

Example:

DbtDag(
    project_config=ProjectConfig(
        manifest_path="s3://path/to/manifest.json",
        manifest_conn_id="aws_conn",
    ),
    render_config=RenderConfig(
        load_mode=LoadMode.DBT_MANIFEST,
    )
    # ...,
)

We can have separate tickets to support loading manifests from other cloud providers.

MrBones757 commented 10 months ago

I'd like to add some additional ideas / comments here.

Rather than support the s3 uri, would it be worth creating a set of classes similar to the way we handle profiles.

I.e im thinking something like

ManifestSourceBase -> S3ManifestSource -> ArtifactoryManifestSource -> NexusManifestSource etc

This would make it really modular and allow us to source from numerous artifact stores, and support more than aws specific s3, think azure blob store, Google Storage, Cloudflare R2 as well as those above.

Thinking about integration, it would be fairly easy to just allow ManifestSourceBase as a possible type for the manifest path arg, and make ManifestSourceBase an abstract base.

This also relates in a way to https://github.com/astronomer/astronomer-cosmos/issues/570, as they are somewhat related & competing ideas as they both deal with a remotely sourced manifest (and indeed, profiles.yml, for which the same logic could be used - Perhaps ManifestSourceBase -> CosmosFileSource or something, that would accept injects creds, or airflow conns)

idealopamp commented 10 months ago

I'd like to add some additional ideas / comments here.

Rather than support the s3 uri, would it be worth creating a set of classes similar to the way we handle profiles.

I.e im thinking something like

ManifestSourceBase -> S3ManifestSource -> ArtifactoryManifestSource -> NexusManifestSource etc

This would make it really modular and allow us to source from numerous artifact stores, and support more than aws specific s3, think azure blob store, Google Storage, Cloudflare R2 as well as those above.

Thinking about integration, it would be fairly easy to just allow ManifestSourceBase as a possible type for the manifest path arg, and make ManifestSourceBase an abstract base.

This also relates in a way to #570, as they are somewhat related & competing ideas as they both deal with a remotely sourced manifest (and indeed, profiles.yml, for which the same logic could be used - Perhaps ManifestSourceBase -> CosmosFileSource or something, that would accept injects creds, or airflow conns)

This would be great. Our team would love to see this for Google Cloud Storage.

dosubot[bot] commented 6 months ago

Hi, @tatiana,

I'm helping the Cosmos team manage their backlog and am marking this issue as stale. The issue involves adding support for referencing a manifest in an S3 bucket, with additional suggestions for creating a modular set of classes to handle various artifact stores. It seems that the issue is still unresolved, and I'd like to confirm if it's still relevant to the latest version of the Cosmos repository. If it is, please let the Cosmos team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution, and I look forward to hearing from you.

Dosu

tatiana commented 2 months ago

There is interest from Astro customers on this feature.

Pawel-Drabczyk commented 2 months ago

Are there any estimates of when this feature can be released? This change will help our team a lot.

pankajkoti commented 2 months ago

@Pawel-Drabczyk I am analysing this issue at the moment and we would ideally like to have this in the upcoming Cosmos 1.6.0 release.

pankajkoti commented 2 months ago

I have created a PR in draft for supporting this https://github.com/astronomer/astronomer-cosmos/pull/1109. Tested the implementation with AWS S3 and GCP GCS. Need some help with testing with Azure store wrt to right resources and access.

pankajkoti commented 1 month ago

PR https://github.com/astronomer/astronomer-cosmos/pull/1109 is ready for review and I have addressed the review comments so far.