github / CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.
https://arxiv.org/abs/1909.09436
MIT License
2.18k stars 385 forks source link

Expired or Private Links of Java Code Snippets in CodeSearchNET #242

Open harshgeek4coder opened 2 years ago

harshgeek4coder commented 2 years ago

I was trying to access codesearch net dataset for my work - specifically java based data via the given link in codesearchnet repository : https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/java.zip

While accessing the data parameters and initial data exploration, I observed that a lot of JAVA code snippets, which were taken from various sources of public github repositories, a lot of those repositories have been either turned private or their respective github repository links have expired.

Due to the reason mentioned above, I am not able to access their original github repository.

Can you kindly take a look and let me know if there is any way possible to extract the entire github repository from which the java code snippets and their respective documentation has been obtained?

Actually my work requires cloning the entire project repository from which the codesearchnet java dataset has been extracted.