amplab / training-scripts

Scripts to launch cluster used for Strata
33 stars 34 forks source link

Copy small data sets first. #9

Closed mengxr closed 10 years ago

mengxr commented 10 years ago

I got broken pipe error when copying the data to HDFS because of poor connection. It should be better if we copy small data sets before the biggest wikipedia counts.

pwendell commented 10 years ago

Sure, sounds good. In the past I launch all of the clusters from a driver node on ec2... but good idea to have this!