flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.68k stars 639 forks source link

[Housekeeping] eager workflows should not require a `FlyteRemote` to be statically initialized #5662

Open ai-rnatour opened 2 months ago

ai-rnatour commented 2 months ago

Describe the issue

In order to execute eager workflows on a remote cluster, a FlyteRemote object needs to be statically created for the decorator. This prevents the FlyteRemote from being constructed with runtime values and causes any module imports to require the side-effect of reading from the filesystem (which is an anti-pattern).

What if we do not do this?

In our case, we have a number of Flyte deployments with different configs. In order order to reuse the eager workflow between deployments, we have to use a variant of this pattern:

@eager(
    remote=FlyteRemote(
        config=Config.auto(config_file=get_config_path_for_aws_account()), # use the AWS ACCOUNT_ID of our Flyte deployment to look up the correct Flyte config
        default_project="my_project", # same for all deployments
        default_domain=os.environ[FLYTE_STAGE_ENV_VAR], # look up the current domain from the runtime environment
    ),
    client_secret_group="my_group",
    client_secret_key="my_secret", # these two are the same for all deployments
)

It would be much better if we could instead create the FlyteRemote inside our task based on runtime inputs. Or, defer creation of the FlyteRemote until the task is running in the cluster and default to values suitable for invoking tasks and workflows registered to the currently executing Flyte deployment.

Related component(s)

Eager workflows

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

runllm[bot] commented 2 months ago

The issue you're describing highlights a limitation in the current implementation of eager workflows in Flyte, where the FlyteRemote object must be statically initialized. This prevents dynamic configuration based on runtime values, which is problematic for environments with multiple Flyte deployments.

According to the Flyte documentation on eager workflows, the FlyteRemote object is required to be set up statically in the decorator. This design choice indeed forces the configuration to be static, which is not ideal for dynamic environments. The current workaround involves using environment variables and configuration files to set up the FlyteRemote object, but this is not a perfect solution.

Would you like more detailed guidance on potential workarounds or further discussion on how to address this issue?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

eapolinario commented 1 month ago

@ai-rnatour , we're aware of this limitation. We plan on actively investing in eager workflows in the coming months. I'll keep you posted.

kumare3 commented 1 month ago

This is on our docket right now and going through options. Please let us know any other issues you see with eager

ai-rnatour commented 1 month ago

This is on our docket right now and going through options. Please let us know any other issues you see with eager

Thanks @kumare3 - I have this other issue open that looks like it's been assigned. Really excited to see the progress here, as we're picking up eager for a couple important workloads.