argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
4.04k stars 381 forks source link

feat: add dataset split to be used along with row idx when `external_id` is not provided on mapping #5616

Closed jfcalvo closed 1 month ago

jfcalvo commented 1 month ago

Description

This PR add the dataset imported split to be used as external_id when there is no value for external_id specified on the import mapping.

If importing the split train for a dataset and no external_id is provided the external_id will be calculated like the following:

With this we are avoiding row duplications when another split is imported to the same dataset. So if later we import the test split for the same dataset we will have for external_id:

Refs https://github.com/argilla-io/roadmap/issues/21

Type of change

How Has This Been Tested

Checklist

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.19%. Comparing base (098d36b) to head (249e5d8). Report is 1 commits behind head on feat/argilla-direct-feature-branch.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## feat/argilla-direct-feature-branch #5616 +/- ## ====================================================================== + Coverage 91.18% 91.19% +0.01% ====================================================================== Files 150 150 Lines 6260 6261 +1 ====================================================================== + Hits 5708 5710 +2 + Misses 552 551 -1 ``` | [Flag](https://app.codecov.io/gh/argilla-io/argilla/pull/5616/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=argilla-io) | Coverage Δ | | |---|---|---| | [argilla-server](https://app.codecov.io/gh/argilla-io/argilla/pull/5616/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=argilla-io) | `91.19% <100.00%> (+0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=argilla-io#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.