argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
4.08k stars 382 forks source link

[BUG-python/deployment] `TaskDistribution` is not correctly received from server #5718

Open davidberenstein1957 opened 6 days ago

davidberenstein1957 commented 6 days ago

Describe the bug I created a dataset with a specific task distribution (min_submitted=3) but when retrieving this back from the server I get only 1.

Stacktrace and Code to create the bug

import os

import argilla as rg
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())

client = rg.Argilla(
    api_url=os.getenv("ARGILLA_API_URL"),
    api_key=os.getenv("ARGILLA_API_KEY"),
)

print(client.datasets.list()[0].settings.distribution.min_submitted)

Expected behavior To retrieve the actual min submitted

Environment:

Additional context

frascuchon commented 6 days ago

Which version are you using? I cannot reproduce the error.

jfcalvo commented 6 days ago

I'm successfully creating a dataset with min_submitted to 3. I'm doing it with the API and it's working as expected:

{
  "id": "0d5a92bb-b85e-44d7-b0ac-55433709a947",
  "name": "testing-url-images-different-strategy",
  "guidelines": null,
  "allow_extra_metadata": true,
  "status": "draft",
  "distribution": {
    "strategy": "overlap",
    "min_submitted": 3
  },
  "metadata": null,
  "workspace_id": "9b40ede0-87d5-4b20-9299-690c0c385c66",
  "last_activity_at": "2024-11-27T11:51:32.777932",
  "inserted_at": "2024-11-27T11:51:32.777932",
  "updated_at": "2024-11-27T11:51:32.777932"
}

Could it be a SDK specific problem?

frascuchon commented 6 days ago

I've tested using the SDK, and it works as expected too

davidberenstein1957 commented 6 days ago

@jfcalvo @frascuchon I originally created it with a distribution of 1 and updated it within the UI to 3 after. Could that be related? I am using SDK 2.4.0 and server 2.4.1

frascuchon commented 6 days ago

I've updated the dataset settings from the UI, and it's working.

davidberenstein1957 commented 5 days ago

@frascuchon @jfcalvo I found the issue.

# this passes
assert len(client.datasets) == 1
# this fails
assert client.datasets[0].settings.distribution.min_submitted == client.datasets(name="image_preferences").settings.distribution.min_submitted
davidberenstein1957 commented 5 days ago

@jfcalvo @frascuchon it seems that when doing the self._from_model() we still need to call get() to update the settings config, even though we already seem to have this availale during but we are not using/passing the settings in the model init.