argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.96k stars 375 forks source link

[FEATURE] How to handle extreme classification datasets? LabelQuestion support 500 labels max #5621

Open msciancalepore98 opened 2 weeks ago

msciancalepore98 commented 2 weeks ago

Hello,

I have a dataset with thousands of labels, but as far as I've understood, the rg.LabelQuestion supports up to 500 labels, which seems to be an arbitrary hardcoded value.. Can I use argilla for my usecase or nop?

log:

UnprocessableEntityError: Argilla SDK error: UnprocessableEntityError: Unprocessable entity. The server cannot process the request. Details: {"detail":{"code":"argilla.api.errors::ValidationError","params":{"errors":[{"loc":["body","settings","LabelSelectionQuestionSettingsCreate","options"],"msg":"ensure this value has at most 500 items","type":"value_error.list.max_items","ctx":{"limit_value":500}}]}}}
msciancalepore98 commented 2 weeks ago

After some sweating I found this piece of doc: https://docs.argilla.io/latest/reference/argilla-server/configuration/#datasets

Please add this detail in the LabelQuestion docstring as well!

msciancalepore98 commented 2 weeks ago

OK after some tests, I notice that:

Do you think there's room for performance improvement? I'd really love to use argilla for my annotation use case...

Also the labels search lags a lot with all these labels :/