github / CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.
https://arxiv.org/abs/1909.09436
MIT License
2.18k stars 385 forks source link

Functions with original comments #246

Open timvandam opened 1 year ago

timvandam commented 1 year ago

I am looking to use CodeSearchNet to train a code completion model that takes the current code as input and predicts the next token. However, the CodeSearchNet data does not appear to contain the raw comments, making it impossible to re-construct the original code (i.e. it is impossible to tell whether the original comment was a single-line comment or multi-line comment for Java, JavaScript, etc)

Is this data available somewhere, or is my best bet to simply put the plain-text comments inside a multi-line comment block for every sample?