Open futurechimp opened 10 years ago
Did you see these instructions? http://ampcamp.berkeley.edu/big-data-mini-course/launching-a-bdas-cluster-on-ec2.html
These point you to a script that will launch EC2 instances for you and automatically load the data; those will work even if you're not at an AMPCamp. Are those scripts not working for you?
On Thu, Mar 13, 2014 at 4:35 AM, Dave Hrycyszyn notifications@github.comwrote:
Hi,
I am interested in completing the tutorials, although I'm not at an Ampcamp (so I don't have access to the AMIs you're using there).
Is there anywhere I can download the Wikipedia data set you're using as the basis of the tutorials? I have looked on the Wikipedia public datasets pages but I don't see anything that looks right. A link to the dataset at the very start of the tutorials would be really helpful.
Reply to this email directly or view it on GitHubhttps://github.com/amplab/training/issues/145 .
Hey, thanks for the pointer - I didn't realize I'd need to actually use the EC2 setup (I have Spark and Shark running locally and I was in a "run it on my machine" mindset when I asked the question).
I'm sure the scripts run fine (and will try them out to be sure), I was just wondering if that dataset is available publicly anywhere. If not, I'll grab it off the server and pull it down to my local setup.
You can get the data the wiki stats data from s3 in the bucket s3://ampcamp-data/wikistats_20090505-01
I'm using local cluster also, would be nice to provide public URL for dataset.
I too think it would be great to have a public URL for the datasets.
The files are publicly acessible - you can copy them down via a tool like s3cmd (https://github.com/s3tools/s3cmd)
Alternatively - the files in that bucket are numbered part-00096 through part-00167. It is possible to access them at a URL like this:
http://ampcamp-data.s3.amazonaws.com/wikistats_20090505-01/part-00167
On Wed, Jun 18, 2014 at 11:09 AM, Aaron Niskode-Dossett < notifications@github.com> wrote:
I too think it would be great to have a public URL for the datasets.
— Reply to this email directly or view it on GitHub https://github.com/amplab/training/issues/145#issuecomment-46472092.
Thank you! What about the MovieLens data used in the MLlib section?
Those files are small and so we just included them in the AMI - they are available here: http://files.grouplens.org/datasets/movielens/ml-1m.zip
On Wed, Jun 18, 2014 at 11:58 AM, Aaron Niskode-Dossett < notifications@github.com> wrote:
Thank you! What about the MovieLens data used in the MLlib section?
— Reply to this email directly or view it on GitHub https://github.com/amplab/training/issues/145#issuecomment-46478428.
Hi,
I am interested in completing the tutorials, although I'm not at an Ampcamp (so I don't have access to the AMIs you're using there).
Is there anywhere I can download the Wikipedia data set you're using as the basis of the tutorials? I have looked on the Wikipedia public datasets pages but I don't see anything that looks right. A link to the dataset at the very start of the tutorials would be really helpful.