argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.8k stars 355 forks source link

[FEATURE] Do not stop logging records if `UnprocessableEntityError` is raised because one single record #5415

Open gabrielmbmb opened 4 weeks ago

gabrielmbmb commented 4 weeks ago

Is your feature request related to a problem? Please describe. When sending to the server many records using dataset.records.log it can happen that one record contains a not valid value for some reason. When this happens an UnprocessableEntityError is raised that stops the logging loop, which is really annoying.

image

Describe the solution you'd like Instead of raising the exception, I would just let the users know that one of the records couldn't be registered in the server, and continue sending records to the server. If possible, I would let the users know in this warning the index of the record that failed, so they can go to the list of provided records to the method dataset.records.log and check it.

davidberenstein1957 commented 3 weeks ago

@gabrielmbmb which version is this? I believe we normally strive fro providing the index of the specific record. Not sure if it would be best to just continue because it might be a small warning you end up missing and then you suddenly don't sync the expected data.

gabrielmbmb commented 2 weeks ago

2.0.1

burtenshaw commented 1 week ago

@frascuchon I think we should deal with this by implementing a parameter in the log method of DatasetRecords that defines how it deals with errors. Something like on_error which takes a literal of raise, skip, return.

We should also improve the logging so that it logs the record index relative to the whole dataset, not the batch.

burtenshaw commented 1 week ago

@nataliaElv I think this is something we could easily fit into 2.2.