github / CodeSearchNet

Datasets, tools, and benchmarks for representation learning of code.
https://arxiv.org/abs/1909.09436
MIT License
2.18k stars 385 forks source link

How big the dataset is? #237

Open skye95git opened 3 years ago

skye95git commented 3 years ago

The description in Setup: The datasets you will download (most of them compressed) have a combined size of only ~ 3.5 GB.

The description in Downloading Data from S3: The size of the dataset is approximately 20 GB.

They are all data downloaded by running script/setup. Why not the same amount of data? Which one is right? Does 3.5G refer only to the size of the dataset per programming language?