Open clopezhrimac opened 1 year ago
Hi @clopezhrimac, thanks for your question! KubeFlow limits the execution of pipelines locally, alternatives pointed by the community are:
In this project, we've optimised components to be inline with option (2). Commands which help you with testing locally are:
make setup-all-components
make test-all-components
or
make setup-component GROUP=<e.g. vertex-components>
make test-component GROUP=<e.g. vertex-components>
Further, we've replaced the python-based training component in the pipelines with a CustomTrainingJob
recently which allows you to run your training script locally before submitting to Vertex AI.
While these don't provide full parity between local pipeline runs and submitting pipelines to Vertex AI, these will help you to iterate locally over any changes related to custom python-based components and your training code.
We're currently evaluating the use of CustomPythonPackageTrainingJob
, too, and are open to any suggestions you might have!
What is the diference between CustomTrainingJob y CustomPythonPackageTrainingJob ?
Hi @clopezhrimac,
Thanks for this issue. Please check out the most recent PR and release.
We've moved away from CustomTrainingJob
and CustomPythonPackageTrainingJob
since KubeFlow 2.0 supports container components now.
You can cd
into the model
folder and run your training and prediction code locally before triggering a pipeline in Vertex AI. However, this will only test the training and prediction steps, not the pipeline end-to-end.
I am facing difficulties in debugging the Vertex AI training pipeline. The issue lies in the fact that I cannot run the pipeline locally for testing and debugging purposes. Instead, I have to submit the pipeline to Vertex AI and wait for it to execute in order to obtain debugging information.
The current debugging process involves sending the pipeline with multiple print statements or logging messages to trace the execution flow and pinpoint the exact location of the error. This becomes a slow and tedious cycle as it requires resubmitting the pipeline every time an adjustment or error identification is needed.
Steps to Reproduce the Problem:
What would be the best way to handle this training component development cycle?