amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

Running exercises on Google Compute engine #235

Open femibyte opened 8 years ago

femibyte commented 8 years ago

Hi I would like to run the training exercises on a Google Compute Engine cluster as I don't have an account on Amazon AWS. I was able to copy the wikipedia pagecounts data successfully to Google Compute Engines equivalent of S3 but I noticed that the data was enhanced to insert the date stamp as the 1st field in the input files. Can you provide me with a pointer to the code that you used to do this, or show me where I can copy the modified pagecounts data from ? I copied the raw data from here: http://dumps.wikimedia.org/other/pagecounts-raw/2009/

Any help you can provide would be much appreciated.