allegroai / clearml

ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
https://clear.ml/docs
Apache License 2.0
5.61k stars 651 forks source link

Registered artifacts not updated correctly between task runs #1016

Closed materight closed 1 year ago

materight commented 1 year ago

Describe the bug

When continuing an existing task and trying to add rows to a DataFrame after calling register_artifact, sometimes the artifact is not updated correctly.

To reproduce

import pandas as pd
from clearml import Task

task = Task.init(project_name='Test', task_name='test', continue_last_task=True)
if 'df' not in task.artifacts:
    df = pd.DataFrame(columns=['a'])
else:
    df = task.artifacts['df'].get()
task.register_artifact('df', df)

for _ in range(50):
    df.loc[len(df)] = {'a': len(df)}

task.flush(wait_for_uploads=True)
print('df size:', len(df))

Results, when running the script for the:

Expected behaviour

The 3rd time should print df size: 150. It seems that the results of the second execution are not stored or returned correctly. Calling task.artifacts['df'].get() always returns only the first 50 rows. However, in the dashboard I can see 100 rows stored:

image

Environment

materight commented 1 year ago

It was an issue with my code, changing task.artifacts['df'].get() to task.artifacts['df'].get(force_download=True) solved it.