amplab / training

Training materials for Strata, AMP Camp, etc
150 stars 121 forks source link

No /wiki/pagecounts data #156

Open cganoo opened 10 years ago

cganoo commented 10 years ago

I setup a spark cluster in AWS EC2 using the ampcamp4 branch of https://github.com/amplab/training.

However I dont see a "wiki" directory anywhere. What I do see is "ampcamp-data" directory. It would be great if the exercise documentations could reference this. Additionally when I try to access the pagecounts data within ampcamp-data directory using ephemeral-hadoop I run into access errors.

[root@ip-10-180-218-101 ~]# ephemeral-hdfs/bin/hadoop fs -ls /ampcamp-data/pagecounts/ ls: Cannot access /ampcamp-data/pagecounts/: No such file or directory. [root@ip-10-180-218-101 ~]# ephemeral-hdfs/bin/hadoop fs -cat /ampcamp-data/pagecounts/part-00148 cat: File does not exist: /ampcamp-data/pagecounts/part-00148

What am I doing wrong?

muvic08 commented 9 years ago

I am having the same issue.

muvic08 commented 9 years ago

@cganoo This is how I managed to debug the issue. First, check the content of your root hadoop file system (apparently, it is different from your machine's file system)

ephemeral-hdfs/bin/hadoop fs -ls /

If wiki or ampcamp-data directory doesn't exist, you can create it.

ephemeral-hdfs/bin/hadoop p fs -mkdir /wiki

Then you can copy /ampcamp-data/pagecounts to hadoop (note: I found pagecounts data at /ampcamp-data/pagecounts)

ephemeral-hdfs/bin/hadoop fs -put /ampcamp-data/pagecounts /wiki

I hope this helps.