Closed StellaAthena closed 3 years ago
pdf to text is pretty much finished here -> https://github.com/sdtblck/PDFextract so whenever we have a pipeline for downloading, this element should be ready to go
Moving to To Do now that the Pile V1 is finished.
Priority: medium