bigcode-project / bigcode-analysis

Repository for analysis and experiments in the BigCode project.
Apache License 2.0
109 stars 20 forks source link

add subtsring decontamination #16

Closed RaymondLi0 closed 1 year ago

RaymondLi0 commented 1 year ago

Exact-substring match for decontamination #13

This removes 336 files from the python-permissive dataset and 292 from the python-permissive-dedup dataset. All these removals match HumanEval samples.