EleutherAI / the-pile

MIT License
1.46k stars 126 forks source link

Enhance downloading #60

Closed researcher2 closed 3 years ago

researcher2 commented 3 years ago

Add continuous checksumming for big speed up on mechanical disks for large files.

Add checkpointing for both checksum and file.

Add auto retry and backoff on HTTP failure.

Add nice handling of ctrl-c (SIGINT) during download.

Add tqdm to sha256sum funciton for large downloads still using gdrive.