storage labs - Githubissues

Nithinraot commented 6 years ago

Activity/Testing to be done are as below, Replicate data to another cluster Use teragen and terasort to test throughput Test HDFS Snapshots Enable NameNode HA configuration

Nithinraot commented 6 years ago

Use Teragen -> time hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar teragen -D dfs.block.size=33554432 -Dmapreduce.job.maps=4 1000000 /user/nithinraot/teragen1/

Nithinraot commented 6 years ago

Teragen -> [nithinraot@ip-172-31-6-251 jars]$ time hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar teragen -D dfs.blocksize=33554432 -Dmapreduce.job.maps=4 10000000 /user/nithinraot/teragen/ 18/05/15 03:35:05 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-6-251.us-east-2.compute.internal/172.31.6.251:8032 18/05/15 03:35:06 INFO terasort.TeraGen: Generating 10000000 using 4 18/05/15 03:35:06 INFO mapreduce.JobSubmitter: number of splits:4 18/05/15 03:35:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526333863746_0006 18/05/15 03:35:06 INFO impl.YarnClientImpl: Submitted application application_1526333863746_0006 18/05/15 03:35:06 INFO mapreduce.Job: The url to track the job: http://ip-172-31-6-251.us-east-2.compute.internal:8088/proxy/application_1526333863746_0006/ 18/05/15 03:35:06 INFO mapreduce.Job: Running job: job_1526333863746_0006 18/05/15 03:35:12 INFO mapreduce.Job: Job job_1526333863746_0006 running in uber mode : false 18/05/15 03:35:12 INFO mapreduce.Job: map 0% reduce 0% 18/05/15 03:35:21 INFO mapreduce.Job: map 25% reduce 0% 18/05/15 03:35:22 INFO mapreduce.Job: map 100% reduce 0% 18/05/15 03:35:22 INFO mapreduce.Job: Job job_1526333863746_0006 completed successfully 18/05/15 03:35:22 INFO mapreduce.Job: Counters: 31 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=596872 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=337 HDFS: Number of bytes written=1000000000 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=31947 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=31947 Total vcore-milliseconds taken by all map tasks=31947 Total megabyte-milliseconds taken by all map tasks=32713728 Map-Reduce Framework Map input records=10000000 Map output records=10000000 Input split bytes=337 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=502 CPU time spent (ms)=25700 Physical memory (bytes) snapshot=1450917888 Virtual memory (bytes) snapshot=11176689664 Total committed heap usage (bytes)=1438121984 org.apache.hadoop.examples.terasort.TeraGen$Counters CHECKSUM=21472776955442690 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=1000000000

real 0m19.456s user 0m6.967s sys 0m0.339s

Nithinraot commented 6 years ago

Terasort -> [nithinraot@ip-172-31-6-251 jars]$ time hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar terasort /user/nithinraot/teragen/ /user/nithinraot/terasort/ 18/05/15 03:35:46 INFO terasort.TeraSort: starting 18/05/15 03:35:47 INFO input.FileInputFormat: Total input paths to process : 4 Spent 122ms computing base-splits. Spent 2ms computing TeraScheduler splits. Computing input splits took 125ms Sampling 10 splits of 32 Making 8 from 100000 sampled records Computing parititions took 572ms Spent 699ms computing partitions. 18/05/15 03:35:47 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-6-251.us-east-2.compute.internal/172.31.6.251:8032 18/05/15 03:35:48 INFO mapreduce.JobSubmitter: number of splits:32 18/05/15 03:35:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1526333863746_0007 18/05/15 03:35:48 INFO impl.YarnClientImpl: Submitted application application_1526333863746_0007 18/05/15 03:35:48 INFO mapreduce.Job: The url to track the job: http://ip-172-31-6-251.us-east-2.compute.internal:8088/proxy/application_1526333863746_0007/ 18/05/15 03:35:48 INFO mapreduce.Job: Running job: job_1526333863746_0007 18/05/15 03:35:54 INFO mapreduce.Job: Job job_1526333863746_0007 running in uber mode : false 18/05/15 03:35:54 INFO mapreduce.Job: map 0% reduce 0% 18/05/15 03:36:00 INFO mapreduce.Job: map 6% reduce 0% 18/05/15 03:36:01 INFO mapreduce.Job: map 9% reduce 0% 18/05/15 03:36:05 INFO mapreduce.Job: map 16% reduce 0% 18/05/15 03:36:06 INFO mapreduce.Job: map 28% reduce 0% 18/05/15 03:36:07 INFO mapreduce.Job: map 34% reduce 0% 18/05/15 03:36:08 INFO mapreduce.Job: map 38% reduce 0% 18/05/15 03:36:10 INFO mapreduce.Job: map 44% reduce 0% 18/05/15 03:36:14 INFO mapreduce.Job: map 47% reduce 0% 18/05/15 03:36:15 INFO mapreduce.Job: map 56% reduce 0% 18/05/15 03:36:16 INFO mapreduce.Job: map 59% reduce 0% 18/05/15 03:36:17 INFO mapreduce.Job: map 72% reduce 0% 18/05/15 03:36:20 INFO mapreduce.Job: map 78% reduce 0% 18/05/15 03:36:23 INFO mapreduce.Job: map 84% reduce 0% 18/05/15 03:36:24 INFO mapreduce.Job: map 88% reduce 0% 18/05/15 03:36:27 INFO mapreduce.Job: map 100% reduce 0% 18/05/15 03:36:30 INFO mapreduce.Job: map 100% reduce 25% 18/05/15 03:36:34 INFO mapreduce.Job: map 100% reduce 38% 18/05/15 03:36:35 INFO mapreduce.Job: map 100% reduce 63% 18/05/15 03:36:37 INFO mapreduce.Job: map 100% reduce 75% 18/05/15 03:36:38 INFO mapreduce.Job: map 100% reduce 100% 18/05/15 03:36:38 INFO mapreduce.Job: Job job_1526333863746_0007 completed successfully 18/05/15 03:36:39 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=439890109 FILE: Number of bytes written=879984724 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1000004992 HDFS: Number of bytes written=1000000000 HDFS: Number of read operations=120 HDFS: Number of large read operations=0 HDFS: Number of write operations=16 Job Counters Launched map tasks=32 Launched reduce tasks=8 Data-local map tasks=30 Rack-local map tasks=2 Total time spent by all maps in occupied slots (ms)=228025 Total time spent by all reduces in occupied slots (ms)=69832 Total time spent by all map tasks (ms)=228025 Total time spent by all reduce tasks (ms)=69832 Total vcore-milliseconds taken by all map tasks=228025 Total vcore-milliseconds taken by all reduce tasks=69832 Total megabyte-milliseconds taken by all map tasks=233497600 Total megabyte-milliseconds taken by all reduce tasks=71507968 Map-Reduce Framework Map input records=10000000 Map output records=10000000 Map output bytes=1020000000 Map output materialized bytes=434059881 Input split bytes=4992 Combine input records=0 Combine output records=0 Reduce input groups=10000000 Reduce shuffle bytes=434059881 Reduce input records=10000000 Reduce output records=10000000 Spilled Records=20000000 Shuffled Maps =256 Failed Shuffles=0 Merged Map outputs=256 GC time elapsed (ms)=6639 CPU time spent (ms)=159230 Physical memory (bytes) snapshot=19919757312 Virtual memory (bytes) snapshot=111997677568 Total committed heap usage (bytes)=20872953856 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1000000000 File Output Format Counters Bytes Written=1000000000 18/05/15 03:36:39 INFO terasort.TeraSort: done

real 0m53.992s user 0m8.958s sys 0m0.432s

Nithinraot commented 6 years ago

Enable snap shot ->

hadoop fs -mkdir /user/nithinraot/precious
1. hadoop fs -put jclouds-compute-1.7.1.jar /user/nithinraot/precious/
2. enable snapshot & take snap shot of the required directory. In this case, I had taken the snapshot of entire "/" directory. we can also take it for one specific directory.
3. now we can delete the new file that is pushed into the hdfs directory -> hadoop fs -rm /user/nithinraot/precious/jclouds-compute-1.7.1.jar
4. snapshot file is availbale under "/.snapshot/sebc-hdfs-test/user/nithinraot/precious"
5. hadoop fs -cp -ptopax /.snapshot/sebc-hdfs-test/user/nithinraot/precious/jclouds-compute-1.7.1.jar /user/nithinraot/precious/

Nithinraot commented 6 years ago

Enable NameNode HA configuration ->

Enabled HA from actions
gave the alternate host for name node & 3 journal nodes.
Added a new user (nithinraot) as "full admin" from CM under administrator-> users tab.
gave the password as cloudera for user nithinraot.
Now updated the role again back to "limited operator" and saved the changes. http://ec2-18-191-90-199.us-east-2.compute.amazonaws.com:7180/cmf/users

Nithinraot commented 6 years ago

Could not test "Replicate data to another cluster" as there was no one using public IP.

Nithinraot / SEBC

storage labs #2