getindata / kedro-vertexai

Kedro Plugin to support running workflows on GCP Vertex AI Pipelines
https://kedro-vertexai.readthedocs.io
Apache License 2.0
35 stars 11 forks source link

[feature] Code upload instead of docker push workflow #82

Open marrrcin opened 2 years ago

marrrcin commented 2 years ago

This idea is borrowed from Azure ML (and this PR https://github.com/getindata/kedro-azureml/pull/15 ) - where you define an Environment, which is a docker image which runs your image, but the code is not part of the image (only dependencies are present in the image). The workflow for that will make Data Science iterations faster, as they will not have to build the docker image every time they want to run / debug something in Vertex AI. This issue itself will be partially addressed by #81 , but this would be a next iteration on that.

General workflow would work like this:

  1. Docker image with dependencies is uploaded to the container registry.
  2. User runs kedro vertexai run-once with some flag (or maybe we should have kedro vertexai run for docker and kedro vertexai run-once for this flow 💡)
  3. The code of Kedro project is copied to GCS (first packaged and compressed) and the job is started within the container. The container should have a modified entrypoint which will first download the code from GCS and then execute it.

Please discuss the design with @em-pe and @szczeles before implemeting.

adrienpl commented 10 months ago

This feature is really a must have to have a functional development environment with kedro.

szczeles commented 10 months ago

@adrienpl Agree! The current development cycle with docker images is so painful...

Actually, I've been working on some implementation of this feature a year ago. Once I find the local branch, I will push it so somebody can take it over (I'm not into kedro/vertex anymore).

adrienpl commented 10 months ago

Thank you ! It will be really helpfull. What are you using right now ?

szczeles commented 10 months ago

@adrienpl I'm no longer on a project in MLOps area, so I'm using just boring dev tools ATM ;-)