bigcode-project / bigcode-analysis

Repository for analysis and experiments in the BigCode project.
Apache License 2.0
113 stars 20 forks source link

Add decontamination code #14

Closed ChenghaoMou closed 1 year ago

ChenghaoMou commented 1 year ago

Similar to deduplication, but use benchmark datasets as index.

Used for #13

RaymondLi0 commented 1 year ago

Thank you @ChenghaoMou ! Sure, maybe we can first merge this PR, then I'll upload my script too? (this PR is from a fork, so otherwise I'd need to commit directly in Chenghao's fork)

ChenghaoMou commented 1 year ago

I don't have write access for merge, please help.

loubnabnl commented 1 year ago

Ah sorry thought you could merge when the PR's approved