Closed yellowcap closed 4 months ago
This was some challenge as the downloads are slow for the 10M files and 20TB of data we have. Found a clunky way to speed up
https://gist.github.com/yellowcap/c39a10d19d78833b29a2c4828b47b2ea
@yellowcap, you might want to take a look at s5cmd
For training we are copying all data from S3 into a EBS volume. This will have to change for v1 but is still doable for v0.2.