run one hadoop sample - Githubissues

At first, there is a problem with permission.

org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x

Can be fixed by the following command, but may be not a good solution to chmod.

[root@docker-ambari AWS_CLI]# hadoop fs -chmod 777 /user
[root@docker-ambari AWS_CLI]# hadoop jar hadoop-examples-1.2.1.jar pi 2 1000000
Number of Maps  = 2
Samples per Map = 1000000
Wrote input for Map #0
Wrote input for Map #1
Starting Job
15/10/16 17:15:34 INFO impl.TimelineClientImpl: Timeline service address: http://amb-client1.service.consul:8188/ws/v1/timeline/
15/10/16 17:15:34 INFO client.RMProxy: Connecting to ResourceManager at amb-client4.service.consul/172.17.0.5:8050
15/10/16 17:15:35 INFO input.FileInputFormat: Total input paths to process : 2
15/10/16 17:15:35 INFO mapreduce.JobSubmitter: number of splits:2
15/10/16 17:15:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1444901994309_0001
15/10/16 17:15:36 INFO impl.YarnClientImpl: Submitted application application_1444901994309_0001
15/10/16 17:15:36 INFO mapreduce.Job: The url to track the job: http://amb-client4.service.consul:8088/proxy/application_1444901994309_0001/
15/10/16 17:15:36 INFO mapreduce.Job: Running job: job_1444901994309_0001
15/10/16 17:15:47 INFO mapreduce.Job: Job job_1444901994309_0001 running in uber mode : false
15/10/16 17:15:47 INFO mapreduce.Job:  map 0% reduce 0%
15/10/16 17:15:59 INFO mapreduce.Job:  map 100% reduce 0%
15/10/16 17:16:08 INFO mapreduce.Job:  map 100% reduce 100%
15/10/16 17:16:09 INFO mapreduce.Job: Job job_1444901994309_0001 completed successfully
15/10/16 17:16:09 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=50
        FILE: Number of bytes written=295716
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=560
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=11
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters 
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=20935
        Total time spent by all reduces in occupied slots (ms)=5803
        Total time spent by all map tasks (ms)=20935
        Total time spent by all reduce tasks (ms)=5803
        Total vcore-seconds taken by all map tasks=20935
        Total vcore-seconds taken by all reduce tasks=5803
        Total megabyte-seconds taken by all map tasks=3558950
        Total megabyte-seconds taken by all reduce tasks=986510
    Map-Reduce Framework
        Map input records=2
        Map output records=4
        Map output bytes=36
        Map output materialized bytes=56
        Input split bytes=324
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=56
        Reduce input records=4
        Reduce output records=0
        Spilled Records=8
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=361
        CPU time spent (ms)=1520
        Physical memory (bytes) snapshot=466104320
        Virtual memory (bytes) snapshot=2362269696
        Total committed heap usage (bytes)=240263168
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=236
    File Output Format Counters 
        Bytes Written=97
Job Finished in 35.734 seconds
Estimated value of Pi is 3.14150400000000000000

just go to

cd  /usr/hdp/2.3.2.0-2950/hadoop-mapreduce/

run the example job below

yarn jar hadoop-mapreduce-examples-2.7.1.2.3.2.0-2950.jar pi 16 100000

There are other examples to run:

aggregatewordcount: An Aggregate-based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate-based map/reduce program that computes the histogram of the words in the input files. bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute the exact digits of pi. dbcount: An example job that counts the pageview counts from a database. distbbp: A map/reduce program that uses a BBP-type formula to compute the exact bits of pi. grep: A map/reduce program that counts the matches to a regex in the input. join: A job that effects a join over sorted, equally partitioned data sets. multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates pi using a quasi-Monte Carlo method. randomtextwriter: A map/reduce program that writes 10 GB of random textual data per node. randomwriter: A map/reduce program that writes 10 GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A Sudoku solver. teragen: Generate data for the terasort. terasort: Run the terasort. teravalidate: Check the results of the terasort. wordcount: A map/reduce program that counts the words in the input files. wordmean: A map/reduce program that counts the average length of the words in the input files. wordmedian: A map/reduce program that counts the median length of the words in the input files. wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

Hadoop-Cloud-Configuration / AWS_CLI

run one hadoop sample #4