big-data-europe / docker-hadoop

Apache Hadoop docker image
2.18k stars 1.27k forks source link

The hdfs environment is configured, how do I test it? #111

Open tangzhiqiangh opened 3 years ago

tangzhiqiangh commented 3 years ago

How do I run commands for several tests I want to perform?

E.g

  1. TestDFSIO TestDFSIO is used to test the IO performance of HDFS. It uses a MapReduce job to perform read and write operations concurrently. Each map task is used to read or write each file. The output of the map is used to collect statistical information related to processing files. To accumulate statistical information and generate a summary.

View instructions:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO TestDFSIO.1.7 Usage: TestDFSIO [genericOptions] -read [-random | -backward | -skip [-skipSize Size]] | -write | -append | -clean [-compression codecClassName] [-nrFiles N] [-size Size[B|KB |MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes]

  1. Test HDFS write performance Test content: Write 10 128M files to the HDFS cluster:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -write \ -nrFiles 10 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log Note: Because it is to switch the hdfs user to run on Hadoop, the path for generating local logs does not need to be specified, but it must be run under the right path written by the hdfs user, and the generated log is also under the running path, otherwise the path needs to be specified.

View Results:

cat /tmp/TestDFSIO_results.log ----- TestDFSIO -----: write Date & time: Thu Jun 27 13:46:41 CST 2019 Number of files: 10 Total MBytes processed: 1280.0 Throughput mb/sec: 16.125374788984352 Average IO rate mb/sec: 17.224742889404297 IO rate std deviation: 4.657439940376364 Test exec time sec: 28.751

  1. Test HDFS read performance Test content: read 10 128M files in HDFS cluster

sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -read \ -nrFiles 10 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log

  1. Clear test data

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO -clean 19/06/27 13:57:21 INFO fs.TestDFSIO: TestDFSIO.1.7 19/06/27 13:57:21 INFO fs.TestDFSIO: nrFiles = 1 19/06/27 13:57:21 INFO fs.TestDFSIO: nrBytes (MB) = 1.0 19/06/27 13:57:21 INFO fs.TestDFSIO: bufferSize = 1000000 19/06/27 13:57:21 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 19/06/27 13:57:22 INFO fs.TestDFSIO: Cleaning up test files

2.nnbench nnbench is used to test the load of the NameNode. It generates a lot of HDFS-related requests and puts greater pressure on the NameNode. This test can simulate operations such as creating, reading, renaming and deleting files on HDFS.

View instructions:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ nnbench -help NameNode Benchmark 0.4 Usage: nnbench Options: -operation

The test uses 10 mappers and 5 reducers to create 1000 files:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \ -operation create_write \ -maps 10 \ -reduces 5 \ -blockSize 1 \ -bytesToWrite 0 \ -numberOfFiles 1000 \ -replicationFactorPerFile 3 \ -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-hostname

Results stored on HDFS:

  1. mrbench mrbench will repeatedly execute a small job multiple times to check whether the operation of the small job on the cluster is repeatable and efficient.

View instructions:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ mrbench -help MRBenchmark.0.0.2 Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns < number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]

Test run a job 50 times:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ mrbench \ -numRuns 50 \ -maps 10 \ -reduces 5 \ -inputLines 10 \ -inpu

How do I execute these commands?