Open cosmicBboy opened 3 months ago
Based on the available data sources, here is the issue title and body for the UX improvement:
Issue Title: [Core feature] UX improvement: pyflyte run
includes imported local modules in the target workflow file
Issue Body:
Say I have the following Flytekit project folder structure:
/my_project
/src
__init__.py
module_a.py
... # etc
tasks.py # imports modules from src
workflows.py # imports modules from task
setup.py
Currently, for this workflow to run correctly with pyflyte run --remote
, I have to pass in the --copy-all
flag to make sure that all the source files are present in the container running the workflow.
This isn't a great beginner UX: I would expect a local pyflyte run
and remote pyflyte run --remote
to "just work".
Today, pyflyte run --remote workflow.py
will fast-register the workflow, meaning that the workflow.py
file itself will be shipped as a tar file and injected into the container running in the remote pod.
Ideally, pyflyte run
should also automatically detect all of the locally defined modules that are being used in workflow.py
and include them in the tar file being fast registered.
Keep the status quo, which is to require users to provide the --copy-all
flag. There's also another issue proposing a --copy
flag (see here) which proposes a more targeted version of the --copy-all
flag.
No response
Sources:
This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.
Mention @runllm to ask follow-up questions. See runllm.com for more!
If there is a setup.py + src directory, then I assume there is a package to be installed with the src layout. There are two ways to make the pyflyte run --remote
UX work:
setup.py
assume we want to --copy-all
and install the package during execution.The solution depends on how task
imports module_a
. If its import module_a
, then we need to install the package or manipulate the folder structure.. If it's import src.module_a
, then src
is kept as is. (Personally, I do not really like using import src.module_a
, but I've see user code like this)
Motivation: Why do you think this is important?
Say I have the following flytekit project folder structure:
Currently, for this workflow to run correctly with
pyflyte run --remote
, I have to pass in the--copy-all
flag to make sure that all the source files are present in the container running the workflow.This isn't a great beginner UX: I would expect a local
pyflyte run
and remotepyflyte run --remote
to "just work".Goal: What should the final outcome look like, ideally?
Today,
pyflyte run --remote workflow.py
will fast-register the workflow, meaning that theworkflow.py
file itself will be shipped as a tar file and injected into the container running in the remote pod.Ideally, pyflyte run should also automatically detect all of the locally defined modules that are being used in
workflow.py
and include it in the tar file being fast registered.Describe alternatives you've considered
Keep the status quo, which is to require users to provide the
--copy-all
flag. There's also another issue proposing a--copy
flag (see here) which proposes a more targeted version of the--copy-all
flag.Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?