IBM / unitxt

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
https://unitxt.rtfd.io
Apache License 2.0
139 stars 29 forks source link

Operators in card loader (and in general) can delete and modify metadata field Unitxt rely on. #967

Open OfirArviv opened 5 days ago

OfirArviv commented 5 days ago

For example, Unitxt relies on the following fields being part of the instance: {str} 'recipe_metadata' {str} 'data_classification_policy'

However, we have 2 operators that delete them from the stream:

Right now we will add a special handing of these fields in these operators. But this is a more root problem: User can delete fields we are relying on without noticing.

Possible solutions: 1) The card recipe should be run before any metadata fields are added. This part of the code is the one with the most "editing" of data. 2) These fields should not be allowed to be edited, unless using a special function, with some sort of mechanism. (And new instances should be forced to add them).