EleutherAI / the-pile

MIT License
1.44k stars 122 forks source link

Make treemaps #70

Closed leogao2 closed 3 years ago

leogao2 commented 3 years ago

We should find a way to generate nice treemaps. I think it would be a great way of visualizing how space is allocated in the pile. Features we'd want would include being able to do color coded two-level hierarchy (i.e first we split by category, then by dataset), and it should actually look nice and not like someone drew it in paint.

Something like this but with words and programmatically generated would be perfect: image

StellaAthena commented 3 years ago

I know how to do this and can do it easily once we have the final sizes.