Open Nithinraot opened 6 years ago
Use Teragen -> time hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar teragen -D dfs.block.size=33554432 -Dmapreduce.job.maps=4 1000000 /user/nithinraot/teragen1/
Teragen -> [nithinraot@ip-172-31-6-251 jars]$ time hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar teragen -D dfs.blocksize=33554432 -Dmapreduce.job.maps=4 10000000 /user/nithinraot/teragen/ 18/05/15 03:35:05 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-6-251.us-east-2.compute.internal/172.31.6.251:8032 18/05/15 03:35:06 INFO terasort.TeraGen: Generating 10000000 using 4 18/05/15 03:35:06 INFO mapreduce.JobSubmitter: number of splits:4 18/05/15 03:35:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526333863746_0006 18/05/15 03:35:06 INFO impl.YarnClientImpl: Submitted application application_1526333863746_0006 18/05/15 03:35:06 INFO mapreduce.Job: The url to track the job: http://ip-172-31-6-251.us-east-2.compute.internal:8088/proxy/application_1526333863746_0006/ 18/05/15 03:35:06 INFO mapreduce.Job: Running job: job_1526333863746_0006 18/05/15 03:35:12 INFO mapreduce.Job: Job job_1526333863746_0006 running in uber mode : false 18/05/15 03:35:12 INFO mapreduce.Job: map 0% reduce 0% 18/05/15 03:35:21 INFO mapreduce.Job: map 25% reduce 0% 18/05/15 03:35:22 INFO mapreduce.Job: map 100% reduce 0% 18/05/15 03:35:22 INFO mapreduce.Job: Job job_1526333863746_0006 completed successfully 18/05/15 03:35:22 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=596872 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=337 HDFS: Number of bytes written=1000000000 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=31947 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=31947 Total vcore-milliseconds taken by all map tasks=31947 Total megabyte-milliseconds taken by all map tasks=32713728 Map-Reduce Framework Map input records=10000000 Map output records=10000000 Input split bytes=337 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=502 CPU time spent (ms)=25700 Physical memory (bytes) snapshot=1450917888 Virtual memory (bytes) snapshot=11176689664 Total committed heap usage (bytes)=1438121984 org.apache.hadoop.examples.terasort.TeraGen$Counters CHECKSUM=21472776955442690 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=1000000000
real 0m19.456s user 0m6.967s sys 0m0.339s
Terasort -> [nithinraot@ip-172-31-6-251 jars]$ time hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar terasort /user/nithinraot/teragen/ /user/nithinraot/terasort/ 18/05/15 03:35:46 INFO terasort.TeraSort: starting 18/05/15 03:35:47 INFO input.FileInputFormat: Total input paths to process : 4 Spent 122ms computing base-splits. Spent 2ms computing TeraScheduler splits. Computing input splits took 125ms Sampling 10 splits of 32 Making 8 from 100000 sampled records Computing parititions took 572ms Spent 699ms computing partitions. 18/05/15 03:35:47 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-6-251.us-east-2.compute.internal/172.31.6.251:8032 18/05/15 03:35:48 INFO mapreduce.JobSubmitter: number of splits:32 18/05/15 03:35:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526333863746_0007 18/05/15 03:35:48 INFO impl.YarnClientImpl: Submitted application application_1526333863746_0007 18/05/15 03:35:48 INFO mapreduce.Job: The url to track the job: http://ip-172-31-6-251.us-east-2.compute.internal:8088/proxy/application_1526333863746_0007/ 18/05/15 03:35:48 INFO mapreduce.Job: Running job: job_1526333863746_0007 18/05/15 03:35:54 INFO mapreduce.Job: Job job_1526333863746_0007 running in uber mode : false 18/05/15 03:35:54 INFO mapreduce.Job: map 0% reduce 0% 18/05/15 03:36:00 INFO mapreduce.Job: map 6% reduce 0% 18/05/15 03:36:01 INFO mapreduce.Job: map 9% reduce 0% 18/05/15 03:36:05 INFO mapreduce.Job: map 16% reduce 0% 18/05/15 03:36:06 INFO mapreduce.Job: map 28% reduce 0% 18/05/15 03:36:07 INFO mapreduce.Job: map 34% reduce 0% 18/05/15 03:36:08 INFO mapreduce.Job: map 38% reduce 0% 18/05/15 03:36:10 INFO mapreduce.Job: map 44% reduce 0% 18/05/15 03:36:14 INFO mapreduce.Job: map 47% reduce 0% 18/05/15 03:36:15 INFO mapreduce.Job: map 56% reduce 0% 18/05/15 03:36:16 INFO mapreduce.Job: map 59% reduce 0% 18/05/15 03:36:17 INFO mapreduce.Job: map 72% reduce 0% 18/05/15 03:36:20 INFO mapreduce.Job: map 78% reduce 0% 18/05/15 03:36:23 INFO mapreduce.Job: map 84% reduce 0% 18/05/15 03:36:24 INFO mapreduce.Job: map 88% reduce 0% 18/05/15 03:36:27 INFO mapreduce.Job: map 100% reduce 0% 18/05/15 03:36:30 INFO mapreduce.Job: map 100% reduce 25% 18/05/15 03:36:34 INFO mapreduce.Job: map 100% reduce 38% 18/05/15 03:36:35 INFO mapreduce.Job: map 100% reduce 63% 18/05/15 03:36:37 INFO mapreduce.Job: map 100% reduce 75% 18/05/15 03:36:38 INFO mapreduce.Job: map 100% reduce 100% 18/05/15 03:36:38 INFO mapreduce.Job: Job job_1526333863746_0007 completed successfully 18/05/15 03:36:39 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=439890109 FILE: Number of bytes written=879984724 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1000004992 HDFS: Number of bytes written=1000000000 HDFS: Number of read operations=120 HDFS: Number of large read operations=0 HDFS: Number of write operations=16 Job Counters Launched map tasks=32 Launched reduce tasks=8 Data-local map tasks=30 Rack-local map tasks=2 Total time spent by all maps in occupied slots (ms)=228025 Total time spent by all reduces in occupied slots (ms)=69832 Total time spent by all map tasks (ms)=228025 Total time spent by all reduce tasks (ms)=69832 Total vcore-milliseconds taken by all map tasks=228025 Total vcore-milliseconds taken by all reduce tasks=69832 Total megabyte-milliseconds taken by all map tasks=233497600 Total megabyte-milliseconds taken by all reduce tasks=71507968 Map-Reduce Framework Map input records=10000000 Map output records=10000000 Map output bytes=1020000000 Map output materialized bytes=434059881 Input split bytes=4992 Combine input records=0 Combine output records=0 Reduce input groups=10000000 Reduce shuffle bytes=434059881 Reduce input records=10000000 Reduce output records=10000000 Spilled Records=20000000 Shuffled Maps =256 Failed Shuffles=0 Merged Map outputs=256 GC time elapsed (ms)=6639 CPU time spent (ms)=159230 Physical memory (bytes) snapshot=19919757312 Virtual memory (bytes) snapshot=111997677568 Total committed heap usage (bytes)=20872953856 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1000000000 File Output Format Counters Bytes Written=1000000000 18/05/15 03:36:39 INFO terasort.TeraSort: done
real 0m53.992s user 0m8.958s sys 0m0.432s
Enable snap shot ->
Enable NameNode HA configuration ->
Enabled HA from actions
gave the alternate host for name node & 3 journal nodes.
Added a new user (nithinraot) as "full admin" from CM under administrator-> users tab.
gave the password as cloudera for user nithinraot.
Now updated the role again back to "limited operator" and saved the changes. http://ec2-18-191-90-199.us-east-2.compute.amazonaws.com:7180/cmf/users
Could not test "Replicate data to another cluster" as there was no one using public IP.
Activity/Testing to be done are as below, Replicate data to another cluster Use teragen and terasort to test throughput Test HDFS Snapshots Enable NameNode HA configuration