flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.41k stars 580 forks source link

[Core feature] Add official support for managing dependencies with Poetry #5592

Open mattiadevivo opened 1 month ago

mattiadevivo commented 1 month ago

Motivation: Why do you think this is important?

Given that Poetry creates a poetry.lock file and locks dependencies versions it would be useful to add official support/documentation about how to create a Dockerfile using poetry and how to structure the project using it.

Goal: What should the final outcome look like, ideally?

A new section in the Flyte documentation and/or a new example in the GitHub repo.

Describe alternatives you've considered

I've considered using pip as proposed, anyway poetry is the tool-to-use in the company and in order to manage python dependencies. Pip is not a possible solution to me.

Propose: Link/Inline OR Additional context

I'd like to build a docker image containing all the dependencies installed globally via poetry (I need to lock dependencies) and register the pipeline to the cluster with the custom image (already pushed to the registry) by using pyflyte register -p ${FLYTE_PROJECT} -d ${FLYTE_DOMAIN} ${PIPELINE_PATH} --version ${PIPELINE_VERSION} --image "${IMAGE_NAME}:${PIPELINE_VERSION}" command.

Here’s the Dockerfile

FROM python:3.11-slim

WORKDIR /usr/app

RUN apt-get update && apt-get install -y --no-install-recommends \
    curl gnupg build-essential cmake libssl-dev

RUN curl -sSL https://install.python-poetry.org/ | POETRY_VERSION=1.7.0 POETRY_HOME=$HOME/.poetry python \
    && ln -s $HOME/.poetry/bin/poetry /usr/bin/poetry \
    # install dependencies on the python interpreter and not in the poetry virtualenv
    && poetry config virtualenvs.create false 

COPY poetry.lock .
COPY pyproject.toml .

RUN poetry install --only main --no-interaction

When the workflow is run I get a ModuleNotFoundError: No module named 'example' where example is the name of the module containing the Flyte components.

This is my folder structure

pipelines
│   └── src
│       ├── README.md
│       └── example
│           ├── README.md
│           ├── __init__.py
│           ├── info.json
│           ├── main.py
│           ├── poetry.lock
│           └── pyproject.toml

Pyflyte command is run from the pipelines's parent directory, with pip all works good.

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 1 month ago

Thank you for opening your first issue here! 🛠

runllm[bot] commented 1 month ago

Based on the provided data sources, here is the issue title and body for adding official support for managing dependencies with Poetry:


Issue Title: [Core feature] Add official support for managing dependencies with Poetry

Issue Body:

Motivation: Why do you think this is important?

Given that Poetry creates a poetry.lock file and locks dependencies versions it would be useful to add official support/documentation about how to create a Dockerfile using poetry and how to structure the project using it.

Goal: What should the final outcome look like, ideally?

A new section in the Flyte documentation and/or a new example in the GitHub repo.

Describe alternatives you've considered

I've considered using pip as proposed, anyway poetry is the tool-to-use in the company and in order to manage python dependencies. Pip is not a possible solution to me.

Propose: Link/Inline OR Additional context

I'd like to build a docker image containing all the dependencies installed globally via poetry (I need to lock dependencies) and register the pipeline to the cluster with the custom image (already pushed to the registry) by using pyflyte register -p ${FLYTE_PROJECT} -d ${FLYTE_DOMAIN} ${PIPELINE_PATH} --version ${PIPELINE_VERSION} --image "${IMAGE_NAME}:${PIPELINE_VERSION}" command.

Here’s the Dockerfile

FROM python:3.11-slim

WORKDIR /usr/app

RUN apt-get update && apt-get install -y --no-install-recommends \
    curl gnupg build-essential cmake libssl-dev

RUN curl -sSL https://install.python-poetry.org/ | POETRY_VERSION=1.7.0 POETRY_HOME=$HOME/.poetry python \
    && ln -s $HOME/.poetry/bin/poetry /usr/bin/poetry \
    # install dependencies on the python interpreter and not in the poetry virtualenv
    && poetry config virtualenvs.create false 

COPY poetry.lock .
COPY pyproject.toml .

RUN poetry install --only main --no-interaction

When the workflow is run I get a ModuleNotFoundError: No module named 'example' where example is the name of the module containing the Flyte components.

This is my folder structure

pipelines
│   └── src
│       ├── README.md
│       └── example
│           ├── README.md
│           ├── __init__.py
│           ├── info.json
│           ├── main.py
│           ├── poetry.lock
│           └── pyproject.toml

Pyflyte command is run from the pipelines's parent directory, with pip all works good.

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?


Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

kumare3 commented 2 days ago

There are quite a few folks who use Poetry already. cc @jeevb does freenome not use poetry?

mattiadevivo commented 22 hours ago

There are quite a few folks who use Poetry already. cc @jeevb does freenome not use poetry?

Could please someone show an example? Thanks

thomasjpfan commented 19 hours ago

@mattiadevivo Your Dockerfile for installing poetry looks okay to me

This could an issue with how Flyte registers.

thomasjpfan commented 19 hours ago

@mattiadevivo The issue with the Dockerfile is WORKDIR /usr/app. You have two options:

  1. In the Dockerfile, use WORKDIR /root and it'll work with default flytekit.
  2. Set destination-dir during register: pyflyte register pipelines/src/examples --destination-dir /usr/app
mattiadevivo commented 18 hours ago

@mattiadevivo The issue with the Dockerfile is WORKDIR /usr/app. You have two options:

  1. In the Dockerfile, use WORKDIR /root and it'll work with default flytekit.
  2. Set destination-dir during register: pyflyte register pipelines/src/examples --destination-dir /usr/app

Thanks, I'm going to test this soon 👍