flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.48k stars 587 forks source link

[Housekeeping] Make flytekit lighter weight/less opinionated about dependencies #4418

Open cosmicBboy opened 10 months ago

cosmicBboy commented 10 months ago

Describe the issue

Currently, flytekit has a bunch of dependencies, many of which are pinned to specific versions or have restrictive constraints: https://github.com/flyteorg/flytekit/blob/38c76876dfe7fc2c62536ca6a195bce8a56c6270/setup.py#L30-L80

This makes it painful for folks to install flytekit especially with existing projects that may have conflicting constraints on shared dependencies.

What if we do not do this?

Users will continue to experience issues with conflicting version pins.

Related component(s)

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

thomasjpfan commented 10 months ago

I went through all the dependencies and categorized:

Required Dependencies (for now)

- click - cloudpickle - croniter - dataclasses-json - docker - flyteidl - googleapis-common-protos - grpc - grpcio-status - importlib-metadata (Can get rid of after Python >= 3.10) - jsonpickle - keyring - kubernetes - marshmallow-enum - marshmallow-jsonschema - mashumaro - protobuf - pyarrow - pytz - pyyaml - requests - rich - rich_click - statsd - typing_extensions

Other dependencies

pingsutw commented 10 months ago

gcsfs and s3fs could be added to extra. like flytekit[s3] or flytekit[gcs]

thomasjpfan commented 7 months ago

On the size of all the dependencies (included indirect ones), here are the wheel sizes for dependencies greater than 1M:

 23M    pyarrow-15.0.0-cp311-cp311-macosx_11_0_arm64.whl
 13M    numpy-1.26.4-cp311-cp311-macosx_11_0_arm64.whl
 11M    botocore-1.31.17-py3-none-any.whl
9.2M    grpcio-1.60.1-cp311-cp311-macosx_10_10_universal2.whl
5.6M    cryptography-42.0.2-cp39-abi3-macosx_10_12_universal2.whl
1.5M    kubernetes-29.0.0-py2.py3-none-any.whl
1.1M    pygments-2.17.2-py3-none-any.whl

With https://github.com/flyteorg/flytekit/pull/1818, I suspect we can make pyarrow an optional dependency. Making numpy optional should be doable as well.

After than there is botocore which is required by AWS, cryptography for Azure. To tackle that we'll need to go with https://github.com/flyteorg/flyte/issues/4418#issuecomment-1814984151 to make progress. For me, I do not think AWS users should be required to install dependencies required by Azure.