EleutherAI / the-pile

MIT License
1.46k stars 126 forks source link

The Eye #51

Closed Robbie-chew closed 3 years ago

Robbie-chew commented 3 years ago

The eye is a platform deicatded to archving any and all kinds of data.

They say they have 140 Tb in total in assorted formats and a good fraction seems to be in text format.

https://the-eye.eu/public/

unfortuatly due to the fact that all of their size estimates seem to be "pending update" it is dificult to give exact estimats on how much of this is textual

cfoster0 commented 3 years ago

I believe the team has contacted folks at the Eye. The Bibliotik component is from them. Do they have other big text datasets that you know of?

StellaAthena commented 3 years ago

Indeed, we are in contact with them and have gotten datasets from them. Long term we are working on hosting a copy of all of the data in the Pile on their systems.

Are there any specific datasets you recommend?

Robbie-chew commented 3 years ago

Not Really

On Tue, Oct 13, 2020 at 4:05 PM Stella Biderman notifications@github.com wrote:

Indeed, we are in contact with them and have gotten datasets from them. Long term we are working on hosting a copy of all of the data in the Pile on their systems.

Are there any specific datasets you recommend?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/EleutherAI/The-Pile/issues/51#issuecomment-707803731, or unsubscribe https://github.com/notifications/unsubscribe-auth/APLK3R66YGSL2DUQVAYFMBDSKRUCNANCNFSM4SLEDAPQ .

StellaAthena commented 3 years ago

Okay. I’m going to tentatively close this issue, but feel free to suggest additional data sets in the future either on GitHub or on Discord.