PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.81k stars 1.55k forks source link

Improve caching in Prefect Project Docker image builds #9392

Open discdiver opened 1 year ago

discdiver commented 1 year ago

First check

Prefect Version

2.x

Describe the current behavior

Currently, the image built in a Prefect Project step installs the packages in requirements.txt last. This leads to having to reinstall the packages - not taking full advantage of caching.

Describe the proposed behavior

Move the package install higher in the Dockerfile to take advantage of caching and save time for developers.

Example Use

No response

Additional context

An attendee at PACC SF raised this issue.

discdiver commented 1 year ago

sorry @madkinsz didn't see you changed the title - feel free to change as you see fit

zmwaris1 commented 1 year ago

Hi, @discdiver I would like to work on this issue. The package installer in Dockerfile runs in the image on line #80. It would be very helpful if you could point out the position I should place it in for better caching.

discdiver commented 1 year ago

Thank you @zmwaris1 ! I'll let @madkinsz weigh in on the position.

zanieb commented 1 year ago

The changes here refer to the prefect-docker implementation at https://github.com/PrefectHQ/prefect-docker/blob/41810b1773b8f4cad958df11eb438e34a7d91719/prefect_docker/projects/steps.py#L101-L109

We'd likely want to copy / install the requirements before copying the rest of the files similar to https://github.com/PrefectHQ/prefect/blob/0f2e5bf63ef89b452e40381aada4be683d0a79ab/Dockerfile#L109

raffifu commented 1 year ago

Hii, i'm new on prefect community. Can i work on this issue?

discdiver commented 12 months ago

Sure @raffifu ! Sounds great!

raffifu commented 12 months ago

Hii, i can't reproduce this issue due to error while setup the dev environment. I got an error AttributeError: module 'docker' has no attribute 'DockerClient' when trying to run sample flows on prefect-docker:

  1. clone the repo prefect and prefect-docker
  2. create virtualenv python3 -m venv .venv && source .venv/bin/activate
  3. Install prefect and prefect-docker in development mode with
    $ cd prefect && pip install -e ".[dev]"
    $ cd prefect-docker && pip install -e ".[dev]"
  4. Run prefect server prefect server start
  5. Run prefect block register -m prefect_docker to install docker as block in prefect
  6. Download this file and run python downloaded_file.py Got an error AttributeError: module 'docker' has no attribute 'DockerClient'

I've try several way to fix this issue:

  1. I'm sure that my python environtment has docker installed and has DockerClient class. because i'm trying to copy the exact same code on python interactive (there's no error)
  2. I've tried to change my python version to 3.8 and still has the same issue
  3. I also have raised an issue on slack but no answer yet

Can you give me a tips to solve this problem? Thank you

discdiver commented 8 months ago

Hi @raffifu. Apologies for the delayed response. Does this issue persist for you? I found this GH issue that might be helpful.