dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.88k stars 1.49k forks source link

Allow injection of credentials for s3_file_manager resource #4946

Open jas-ho opened 3 years ago

jas-ho commented 3 years ago

Use Case

I would like to be able to use the s3_file_manager resource and control the credentials via environment variables instead of via the credentials file ~/.aws/credentials.

This would

Example code:

from dagster import op, graph
from dagster_aws.s3 import s3_file_manager

@op(required_resource_keys={"storage"})
def persist_data(context):
    context.resources.storage.write_data(b"dummy data")

@graph
def s3_persistence_graph():
    persist_data()

job = s3_persistence_graph.to_job(resource_defs={"storage": s3_file_manager})

if __name__ == '__main__':
    job.execute_in_process(run_config={'resources': {'storage': {'config': {'s3_bucket': '...'}}}})

The above fails with botocore.exceptions.NoCredentialsError: Unable to locate credentials unless ~/.aws/credentials is defined. I've tried to set environment variables from a .env file via load_dotenv at various points in the above code but could not get it to work.

Question

Is it somehow possible to inject credentials into s3_file_manager?

Note that I'm a beginner with dagster so it's very possible I'm missing sth obvious :)

Additional Info

Related issues

The topic of credentials injection from environment variables is touched on in https://github.com/dagster-io/dagster/issues/17#issuecomment-394674477


Message from the maintainers:

Excited about this feature? Give it a :thumbsup:. We factor engagement into prioritization.

clairelin135 commented 2 years ago

Hi @jas-ho, thanks for the detailed feature request! Authentication via environment variables occurs when when the s3_file_manager resource is created. I was able to run load_dotenv() within the graph and authenticate via env var credentials, which should work as long as the job is executed in process (because env vars will not be copied to subprocesses):

@op(required_resource_keys={"storage"})
def persist_data(context):
    context.resources.storage.write_data(b"dummy data")

@graph
def s3_persistence_graph():
    load_dotenv()
    persist_data()

job = s3_persistence_graph.to_job(resource_defs={"storage": s3_file_manager})

if __name__ == '__main__':
    job.execute_in_process(
        run_config={'resources': {'storage': {'config': {'s3_bucket': 'claire-test-2'}}}}
    )

Does this work for you?

jayhale commented 2 years ago

Loading behavior for python-dotenv is different between local dagit and Dagster cloud. +1 for needing either better documentation on how to prepare the environment or handle environment changes better.

sryza commented 2 years ago

@gibsondan @elementljarredc - perhaps relevant to what you've been working on of late