github / CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.
https://arxiv.org/abs/1909.09436
MIT License
2.18k stars 385 forks source link

Request to provide unfiltered dataset #239

Closed vatsal-kr closed 2 years ago

vatsal-kr commented 2 years ago

According to the accompanying paper, the dataset has been filtered to, among other things, include only those instances which have an attached documentation. Please provide the unfiltered dataset with the instances without documentation as well

vatsal-kr commented 2 years ago

The dataset in question comes as a pickle file ({LANGUAGE}_dedupe_definitions_v2.pkl) when downloaded from s3/docker