Clay-foundation / model

The Clay Foundation Model (in development)
https://clay-foundation.github.io/model/
Apache License 2.0
242 stars 25 forks source link

Copy v0.2 training data from S3 into EBS storage #174

Closed yellowcap closed 4 months ago

yellowcap commented 4 months ago

For training we are copying all data from S3 into a EBS volume. This will have to change for v1 but is still doable for v0.2.

yellowcap commented 4 months ago

This was some challenge as the downloads are slow for the 10M files and 20TB of data we have. Found a clunky way to speed up

https://gist.github.com/yellowcap/c39a10d19d78833b29a2c4828b47b2ea

chuckwondo commented 4 months ago

@yellowcap, you might want to take a look at s5cmd

yellowcap commented 4 months ago

@yellowcap, you might want to take a look at s5cmd

Oh that would have been helpful indeed! thanks @chuckwondo ! Next time I'll use that for sure.

The clunky transfer script worked too, so now we have 16TB of training data on EBS.