EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics
Apache License 2.0
2.16k stars 156 forks source link

Add checksum for data from huggingface #133

Closed segyges closed 8 months ago

segyges commented 8 months ago

Adds a scraper to go get parity data (which might be extra) and a parity check script to verify downloads. Had a bad git lfs pull and didn't want to have to debug anything else related to it.

Could maybe use some cleanup so that it doesn't assume folder structure as much and maybe doesn't really need the scraper to be included.

haileyschoelkopf commented 8 months ago

Thanks a bunch for the contribution!