argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.91k stars 368 forks source link

[BUG] Validating multi-label text classification records makes new copies of them instead of updating #3265

Closed marcelbusch closed 1 year ago

marcelbusch commented 1 year ago

Describe the bug In multi-label text classification, whenever I annotate records, they get copied to a validated version, but the original record stays the same. So I have a dataset with 1000 records, and after annotating 20 of them, I now have 1020 records in my dataset, the 1000 records with status "default" and the 20 annotated records. Is this expected behaviour? Because like this I can't filter for records I haven't annotated yet, when I filter for status:default I still get all 1000 records.

I run argilla and elasticsearch with the supplied docker-compose file without any modifications.

To Reproduce Steps to reproduce the behavior:

DATASET_NAME = '...'
labels = [...]
settings = rg.TextClassificationSettings(label_schema=labels)
rg.configure_dataset_settings(name=DATASET_NAME, settings=settings)

# sample is a pandas dataframe containing 1000 records
records = sample.apply(lambda x: rg.TextClassificationRecord(
    inputs={"user": x.username, "body": x.rawContent},
    metadata={"hashtags": x.hashtags},
    id=x.id,
    multi_label=True
), axis=1).values.tolist()

rg.log(records, DATASET_NAME)

Expected behavior I expect the original records to change status, so after annotating 20 records I should still have 1000 records in my dataset, now 980 with status "default" and 20 with status "validated"

Environment (please complete the following information):

Edit: Update on some more behaviour:

davidberenstein1957 commented 1 year ago

@marcelbusch do you know for sure that you are the only one working on this. What version of the server and Python package are you using?

I think it might be good to evaluate the query syntax w.r.t. the query bahaviour. Are you using it correctly?

marcelbusch commented 1 year ago

Yes, I am sure I'm the only one working on this. Server and python package version are both 1.10.0

I'm attaching a screenshot to clearify which behaviour I mean: I labeled and validated a record (so now there is 2 versions of it in my dataset) now I query for a combination of words that only exists in this specific record, naturally I get both versions, in python via rg.load, and in the UI it also shows that there are 2 records at the bottom, but I can only see one of them screenshot-argilla

davidberenstein1957 commented 1 year ago

@marcelbusch described in our query syntax overview you would need to query a bit differently in that case "nurnberg AND demonstration AND wegstrecke" should be what you intend to achieve, correct?

w.r.t. the duplicating records, is that still happening? Could you show me a video?

marcelbusch commented 1 year ago

No, when I connect them with AND it's still the same behaviour...

Yes that's still happening, here's a video:

https://github.com/argilla-io/argilla/assets/7290487/b73778ab-2c40-44e4-bc84-5f093c036ab5

davidberenstein1957 commented 1 year ago

@frascuchon

davidberenstein1957 commented 1 year ago

Thanks @marcelbusch, this really helps us.

davidberenstein1957 commented 1 year ago

Hi @marcelbusch, we cannot reproduce this behaviour. Could your provide us with more context w.r.t. your data? Also, could you run pip install argilla --update, delete your current docker images, and set their tag specific to 1.10.0 during re-deployment?

frascuchon commented 1 year ago

Thanks for your feedback @marcelbusch! Is also happening when you validate a single record (without bulk validation action)

Can you share also the extra record info (you can access it by clicking the ... button) for both records?

Screenshot 2023-06-28 at 12 52 01
marcelbusch commented 1 year ago

Could your provide us with more context w.r.t. your data?

I could send you a csv of my pandas dataframe, or what do you mean?

Also, could you run pip install argilla --update, delete your current docker images, and set their tag specific to 1.10.0 during re-deployment?

So I should run server version 1.10.0 with argilla python version 1.11.0? And the docker tag would be v1.10.0 or releases-1.10.0?

Is also happening when you validate a single record (without bulk validation action)

Yes, same thing when using the validate button at the bottom and when setting records per page to 1

Can you share also the extra record info (you can access it by clicking the ... button) for both records?>

They both have the same id, could this be because i set the id manually during record creation?

argilla-info-default argilla-info-validated
marcelbusch commented 1 year ago

I took a look in my debugger at this point: updateDatasetRecords and in the record that get's passed in there when I click on a label the id is incorrect, in fact it seems to be rounded to the nearest 100: the record has an initial id of 1478720431326732291, the record that gets passed in the updateDatasetRecords function has the id 1478720431326732300

frascuchon commented 1 year ago

Thanks for all this info @marcelbusch. It's really helpful. We'll work on fixing that.

As a temporal workaround, I think you can set the record id as a string instead of a number. This should avoid this problem.

frascuchon commented 1 year ago

I've been doing some test and it looks like is a javascript limit

Using curl:

curl -X 'POST' \
  "http://localhost:6900/api/datasets/test-dataset/TextClassification:search?include_metrics=false&workspace=argilla&limit=50&from=0" \
  -H 'accept: application/json' \
  -H 'X-Argilla-Api-Key: argilla.apikey' \
  -H 'Content-Type: application/json' \
  -d '{}'
{"total":1,"records":[{"id":10805720881385292014,"status":"Default","metrics":{},"last_updated":"2023-06-30T13:23:45.462574","inputs":{"additionalProp1":"string","additionalProp3":"string","additionalProp2":"string"},"multi_label":true}],"aggregations":{"predicted_as":{},"annotated_as":{},"annotated_by":{},"predicted_by":{},"status":{"Default":1},"predicted":{},"score":{},"words":{"string":1},"metadata":{}}}

From UI:

Screenshot 2023-06-30 at 15 53 52

@leiyre @damianpumar @keithCuniah Any ideas?