EleutherAI / the-pile

MIT License
1.47k stars 127 forks source link

Libgen (pdfs) #12

Closed StellaAthena closed 3 years ago

StellaAthena commented 4 years ago

Priority: medium

sdtblck commented 4 years ago

pdf to text is pretty much finished here -> https://github.com/sdtblck/PDFextract so whenever we have a pipeline for downloading, this element should be ready to go

StellaAthena commented 3 years ago

Moving to To Do now that the Pile V1 is finished.