allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.61k stars 651 forks source link

raise error if dataset name empty. #1083

Closed mathematicalmichael closed 1 year ago

mathematicalmichael commented 1 year ago

Addresses behavior of the project of a dataset being hidden when dataset name is an empty string.

If the dataset project happens to match an existing project, it can be quite jarring to have one's history of experiments suddenly become hidden / make one think they "disappeared" (shown when hidden files enabled in UI, but new tasks can't point to the project name anymore).

If there's an intent behind this behavior (allowing empty dataset names), please let me know and I'd gladly change it to a warning instead.

Perhaps another solution is dataset_name = None if dataset_name == ""? or modify the project name under this condition so it's got some ephemeral name to it and prevents the behavior of an existing project getting wiped out.

Let me know if a minimal example to reproduce the behavior is desired, I can put one together. Outline of it is:

create a task which creates a dataset with an empty string as the name, under the same project name, get it into the UI in draft mode. Notice the project is visible. Presume this is already a project with hundreds of experiments in it and you just added a new task. Run the task. When dataset completes, the project becomes hidden, appears to "vanish" from the UI (but is visible if hidden files are shown)

If the dataset project name differed from the "existing" project, then the original project is untouched, and the dataset-without-a-name is just under a hidden project.

If the dataset name is anything other than an empty string, everything works as anticipated, dataset lives under the right project, nothing is "hidden" that "shouldn't be".

Related Issue \ discussion

https://clearml.slack.com/archives/CTK20V944/p1690230961759919

Patch Description

raises error, prevents user from naming a dataset with an empty string.

Testing Instructions

create a task that creates a dataset. notice the dataset's project is hidden upon completion. if the project names are the same, this makes it look like the task's entire project disappeared from the UI.

Other Information

jkhenning commented 1 year ago

Hi @mathematicalmichael,

Thanks for this PR, we'll take a look soon 🙂

eugen-ajechiloae-clearml commented 1 year ago

Hi @mathematicalmichael ! Thanks for the PR.

Any reason why we would not want to check this at the begging of the __init__? Also, we could just raise a value error with a simpler message: dataset_name cannot be an empty string.

mathematicalmichael commented 1 year ago

Hi @mathematicalmichael ! Thanks for the PR.

Any reason why we would not want to check this at the begging of the __init__? Also, we could just raise a value error with a simpler message: dataset_name cannot be an empty string.

No problem, just wanted to first check this behavior wasnt some intentional feature I'd be breaking by raising an error.

mathematicalmichael commented 1 year ago

@eugen-ajechiloae-clearml moved the check up. I originally had it where it was because I didn't know if the empty dataset name was a valid thing in the other side of the "if" statement but after closer inspection, it feels relatively reasonable to just not allow people to use dataset names that are empty strings.