allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.72k stars 657 forks source link

Pipeline step can't find a module when run locally and there is diff #1358

Open kiranzo opened 1 day ago

kiranzo commented 1 day ago

Describe the bug/To reproduce

I run clearml pipeline locally, but its steps have default Docker image. I added another step, did not commit it to repo and tried to run pipeline to test how it works. I also have a module to store some utility functions used in pipeline steps, so my project structure is as follows:

root/
├─ src/
│  └─ pipeline.py
├─ steps/
│  ├─ step1.py
│  ├─ step2.py
│  └─ step3_new.py
├─ lib/
│  └─ utils.py
├─ configs/
│  └─pipeline_config.yaml
└─README.md

When I ran it locally (steps still executed in Docker container), the first step failed with ModuleNotFoundError: No module named 'lib'. Note that only step3_new.py was added and pipeline.py was modified, lib/utils.py remained unchanged.

When I committed my changes to repo and then ran it locally again, everything went as expected, which is really weird, because Docker container is supposed to have $PYTHONPATH and everything set independently, so why does it behave like that? Curiously enough, I still got [MainProcess] [INFO] No repository found, storing script code instead message for each step in VSCode terminal (no such messages in Web UI though).

Expected behaviour

Either fix python path when there is diff, or just don't allow running pipeline steps if diff is detected. This feels broken the way it is now.

Environment

eugen-ajechiloae-clearml commented 18 hours ago

Hi @kiranzo ! How are you running the pipeline exactly? Can you share some code that resembles the pipeline controller?

kiranzo commented 18 hours ago

@eugen-ajechiloae-clearml

if __name__ == "__main__":
    pipe = PipelineController(
        project=PROJECT_NAME,
        name=PIPELINE_NAME,
        version=VERSION,
        add_pipeline_tags=True
    )
    pipe.set_default_execution_queue(STEP_QUEUE)
    config = pipe.connect_configuration(
        configuration="configs/pipeline_config.yaml", name="Config"
    )
    params = yaml.load(open(config).read(), Loader=yaml.Loader)

    pipe.add_function_step(
        name="step1",
        function=step1,
        function_kwargs={
            ...
        },
        cache_executed_step=False,
        repo=REPOSITORY,
        repo_branch=BRANCH,
        # project_name=PROJECT_STEPS,
        pre_execute_callback=step_created_callback,
    )

    pipe.add_function_step(
        name="step2",
        function=step2,
        function_kwargs={
            ...
        },
        cache_executed_step=False,
        repo=REPOSITORY,
        repo_branch=BRANCH,
        # project_name=PROJECT_STEPS,
        pre_execute_callback=step_created_callback,
    )

    pipe.add_function_step(
        name="step3_new",
        function=step3_new,
        function_kwargs={
            ...
        },
        cache_executed_step=False,
        repo=REPOSITORY,
        repo_branch=BRANCH,
        # project_name=PROJECT_STEPS,
        pre_execute_callback=step_created_callback,
    )

    pipe.start(queue=CONTROLLER_QUEUE)

And lib.utils is called within step1, step2, step3_new.