allegroai / clearml-server

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Other
364 stars 132 forks source link

Could not find host server definition #221

Open egormcobakaster opened 7 months ago

egormcobakaster commented 7 months ago

when i run pipline from ui appears error: clearml_agent: ERROR: Could not find host server definition (missing ~/clearml.conf or Environment CLEARML_API_HOST) To get started with ClearML: setup your own clearml-server, or create a free account at https://app.clear.ml and run clearml-agent init

docker-compose.yaml:

version: "3.6" services:

apiserver: command:

networks: backend: driver: bridge frontend: driver: bridge

ainoam commented 7 months ago

@egormcobakaster This seems to indicate the environment in which the clearml-agent running your pipeline is deployed is not properly configured. Where are you running this clearml-agent? Did you complete clearml-agent init properly?

egormcobakaster commented 7 months ago

@egormcobakaster This seems to indicate the environment in which the clearml-agent running your pipeline is deployed is not properly configured. Where are you running this clearml-agent? Did you complete clearml-agent init properly?

i am running clearml-agent on the same machine as the clearml-server.

when I start a new agent with a new queue:

clearml-agent daemon --queue 6c86514d67014415967bc1d319f03fac

this error disappears and individual tasks are launched from the ui, but when I start pipline, the first task gets queued and does not leave the queue

jkhenning commented 6 months ago

Hi @egormcobakaster, Can you share the log of the pipeline task and your pipeline code?

Also, do you only have a single clearml-agent running? and what is the queue name it listens to?

egormcobakaster commented 6 months ago

Hi @jkhenning, pipeline log:

Environment setup completed successfully Starting Task Execution: ClearML results page: http://172.21.0.98:8080/projects/6072ec75526e493f917e5e770f24319d/experiments/abf2370a46bc4844984d98643e995ff4/output/log ClearML pipeline page: http://172.21.0.98:8080/pipelines/6072ec75526e493f917e5e770f24319d/experiments/abf2370a46bc4844984d98643e995ff4 2023-12-11 10:03:05,217 - clearml.util - WARNING - 2 task found when searching for {'project_name': 'data process', 'task_name': 'Pipeline step 2 create clearml dataset', 'include_archived': True, 'task_filter': {'status': ['created', 'queued', 'in_progress', 'published', 'stopped', 'completed', 'closed']}} 2023-12-11 10:03:05,217 - clearml.util - WARNING - Selected task Pipeline step 2 create clearml dataset (id=adad180edd364cb1b8cedcb77e0a7712) Launching the next 1 steps Launching step [anotation] Cloning Task id=8e7aac5e6f004730a0a3088f6fb0e327 with parameters: {'General/dataset_path': '/mnt/ext2/datasets/DataSet/Casia_images'} Launching step: anotation Parameters: {'General/dataset_path': '${pipeline.path}'} Configurations: {} Overrides: {}

pipeline code:
from clearml import Dataset
import argparse
import sys
from clearml import Task
from clearml.automation import PipelineController

def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
    # type (PipelineController, PipelineController.Node, dict) -> bool
    print(
        "Cloning Task id={} with parameters: {}".format(
            a_node.base_task_id, current_param_override
        )
    )
    # if we want to skip this node (and subtree of this node) we return False
    # return True to continue DAG execution
    return True

def post_execute_callback_example(a_pipeline, a_node):
    # type (PipelineController, PipelineController.Node) -> None
    print("Completed Task id={}".format(a_node.executed))
    # if we need the actual executed Task: Task.get_task(task_id=a_node.executed)
    return

parser = argparse.ArgumentParser()
parser.add_argument('--path', default='', action='store',
                    help='path to dataset')
args = parser.parse_args()
if args.path == '':
    print("empty path to dataset")
    sys.exit()

pipe = PipelineController(
    name="Pipeline demo", project="data process", version="0.0.1", add_pipeline_tags=False
)

pipe.add_parameter(
    "path",
    args.path,
    "path_to_dataset",
)

pipe.set_default_execution_queue("default")

pipe.add_step(
    name="anotation",
    base_task_project="data process",
    base_task_name="Pipeline step 1 create anotation",
    parameter_override={"General/dataset_path": "${pipeline.path}"},
    pre_execute_callback=pre_execute_callback_example,
    post_execute_callback=post_execute_callback_example,
)

pipe.add_step(
    name="create dataset",
    parents=["anotation"],
    base_task_project="data process",
    base_task_name="Pipeline step 2 create clearml dataset",
    parameter_override={
        "General/dataset_path": "${pipeline.path}",
    },
    pre_execute_callback=pre_execute_callback_example,
    post_execute_callback=post_execute_callback_example,
)

pipe.start()

print("done")

the first task only gets queued and is not executed:

Снимок экрана 2023-12-11 в 10 10 49
egormcobakaster commented 6 months ago

@jkhenning , @ainoam Thanks for the answers, it helped me to create another queue. one for the pipeline and the other for tasks