flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.17k stars 551 forks source link

[BUG] Local `FlyteFile` whose name contains spaces fail in remote execution with "File Not Found" error #5445

Closed redartera closed 2 weeks ago

redartera commented 3 weeks ago

Describe the bug

When a user runs a workflow with pyflyte run --remote ... where one of the input arguments is a FlyteFile whose name contains space characters, the remote execution fails due to a mis-replacement of the space character.

Expected behavior

The remote execution should succeed and the remote file's path should be consistent with what the expect expects. What's happening under the hood is that the Flyte task receives the following path:

s3://my-s3-bucket/flytesnacks/development/FQZXUDEAXUJMXMBR25T5GUJEEQ======/foo%20bar

Whereas the actual file exists under the following (spaces are compatible with S3 paths)

"s3://my-s3-bucket/flytesnacks/development/FQZXUDEAXUJMXMBR25T5GUJEEQ======/foo bar"

Additional context to reproduce

Here is an example workflow

### ./get_file.py
import flytekit
from flytekit.types.file import FlyteFile

@flytekit.task
def get_file(f: FlyteFile) -> FlyteFile:
    return FlyteFile(f.download())

@flytekit.workflow
def wf(f: FlyteFile) -> FlyteFile:
    return get_file(f=f)

Step 1 - Stand up a flyte sandbox with flytectl demo start Step 2 - Create a local file whose name contains a space with dd if=/dev/urandom of="foo bar" bs=1048576 count=5 Step 3 - Run pyflyte run --remote get_file.py wf --f "foo bar"

Screenshots

Screenshot 2024-06-03 at 10 29 58 AM Screenshot 2024-06-03 at 10 33 25 AM

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?