Open HangXue-lab opened 1 year ago
The data is already processed by that stage, and may not be what you want. You probably want the github.tar
from the preliminary components https://the-eye.eu/public/AI/pile_preliminary_components/github.tar and process it yourself.
The link is no longer working, is there another link to obtain the data?
The size of pile is too big for me. I just want to download the "Github" code data. But the number of Pile train file is 30. I would like to know exactly which file contains the "Github" code data.