allenai / fm-cheatsheet

Website for hosting the Open Foundation Models Cheat Sheet.
https://fmcheatsheet.org
248 stars 18 forks source link

Intro Text for Data Decontamination Page #20

Closed danmcduff closed 3 days ago

danmcduff commented 1 month ago

Replace

Data decontamination is the process of removing evaluation data from the training dataset. This important step in data preprocessing ensures the integrity of model evaluation, ensuring that metrics are reliable and not misleading. The following resources aid in proactively protecting test data with canaries, decontaminating data before training, and identifying or proving what data a model was trained on.

With

Data decontamination is the process of removing evaluation data from the training set. This step ensures the integrity of model evaluation. The following resources aid in proactively protecting test data with canaries, decontaminating data before training, and identifying or proving what data a model was trained on.

neural-loop commented 1 month ago

https://onm-demo.aimodels.org/foundation-model-resources/data-decontamination/