allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.63k stars 653 forks source link

Error 12 : Validation error (invalid task field) #385

Closed NazariyOnyshko closed 1 year ago

NazariyOnyshko commented 3 years ago

Hi all, I've faced issue recently when trying to add my custom reported Hyperparam as column on Dashboard. I've tried to report it as float and string, but no luck. I also checked demo server and it turned out that the same issue exists there.

Steps to reproduce: demo server -> Project: ClearML Examples -> Dashboard -> Customize table - add custom parameter from Hyperparameters/General/epochs. You should receive "Error Fetch Experiments failed: Error 12 : Validation error (invalid task field): path=hyperparams.General.epochs.value"

Screenshot attached Am I doing something wrong? Thanks in advance Screenshot_2021-06-20_01-25-46

jkhenning commented 3 years ago

Hi @NazariyOnyshko,

Do you have an example code using the SDK showing how you report these hyper-parameters? Just to make sure we won't miss anything 🙂

NazariyOnyshko commented 3 years ago

Hi @jkhenning, thanks for fast reply. I've created a class on agent's side:

import glob
import os
from datetime import datetime

from clearml import Task, Logger
import Core.utils.logging_utils as logging_utils

logger = logging_utils.get_logger("[clear_ml_worker]")

class ClearML(ClearMLAbs):
    def __init__(self, subproject_name, task_prefix):
        self.task = Task.init(project_name=f"Sport/{subproject_name}",
                              task_name=f"{task_prefix}_{datetime.now().strftime('%Y%m%d_%H%M')}")

    def upload_video(self, vid_folder):
        for video_fname in glob.glob(os.path.join(vid_folder, "*.mp4")):
            logger.info(f"Uploading video file [{video_fname}] to ClearML server")
            Logger.current_logger().report_media(video_fname.split("/")[-1], 'Debug video samples', iteration=1,
                                                 local_path=video_fname)

    def upload_image(self, image_path):
        logger.info(f"Uploading image file [{image_path}] to ClearML server")
        Logger.current_logger().report_media(image_path.split("/")[-1], 'Debug image samples', iteration=1,
                                             local_path=image_path)

    def upload_dict(self, dictionary):
        logger.info(f"Uploading dictionary with metrics to ClearML server")
        self.task.connect(dictionary)

And I'm trying to report hyperparameters by calling method upload_dict, passing Python dictionary.

jkhenning commented 3 years ago

Hi @NazariyOnyshko ,

I assume the hyperparams are the result of the upload_dict() call - can you show an example of the dictionary you use when calling it?

NazariyOnyshko commented 3 years ago

Sure, the structure of dictionary is:

{
            "Total HL count": <int>,
            "Processed HL count": <int>,
            "Unknown HL count": <int>,
            "Training HL count": <int>,
            "Discarded HL count": <int>,
            "Irrelevant HL count": <int>,
            "Error HL count": <int>,
            "TopK acc": <float>,
            "OneNN acc": <float>,
            "PreProcess acc": <float>,
            "Unknowns": <float>,
            "duration": <int>
        }

And actual sent data is attached as image. Also, I must say that this specific task is not a training of single network, it's a benchmark script where we are using inference of 6-7 networks combined with logic. Screenshot_2021-06-20_18-08-03

NazariyOnyshko commented 3 years ago

Sorry, miss-clicked "close issue"

jkhenning commented 3 years ago

So just to clarify, the error you see in the UI is actually not related to a parameter you're reporting in the General hyper-parameters section? I see the error is for epochs but I don't see such an entry in the dictionary

NazariyOnyshko commented 3 years ago

The first picture is from demo server, Project "ClearML Experiments", it's to also notice that issue on UI in not only with my own Project, it's present even on your examples. For my own project error is the same, but value will be different (e.g. General.duration.value)

jkhenning commented 3 years ago

Thanks, we've got it - it's a bug that was somehow missed (we have a fix for it though) - we'll make sure it's part of the next server release 🙂

NazariyOnyshko commented 3 years ago

Thank you

jkhenning commented 1 year ago

Closing this as this was already released. Please reopen if required.