The hdfs environment is configured, how do I test it?

How do I run commands for several tests I want to perform?

E.g

TestDFSIO TestDFSIO is used to test the IO performance of HDFS. It uses a MapReduce job to perform read and write operations concurrently. Each map task is used to read or write each file. The output of the map is used to collect statistical information related to processing files. To accumulate statistical information and generate a summary.

View instructions:

Test HDFS write performance Test content: Write 10 128M files to the HDFS cluster:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -write \ -nrFiles 10 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log Note: Because it is to switch the hdfs user to run on Hadoop, the path for generating local logs does not need to be specified, but it must be run under the right path written by the hdfs user, and the generated log is also under the running path, otherwise the path needs to be specified.

View Results:

cat /tmp/TestDFSIO_results.log ----- TestDFSIO -----: write Date & time: Thu Jun 27 13:46:41 CST 2019 Number of files: 10 Total MBytes processed: 1280.0 Throughput mb/sec: 16.125374788984352 Average IO rate mb/sec: 17.224742889404297 IO rate std deviation: 4.657439940376364 Test exec time sec: 28.751

Test HDFS read performance Test content: read 10 128M files in HDFS cluster

sudo -uhdfs hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO \ -read \ -nrFiles 10 \ -size 128MB \ -resFile /tmp/TestDFSIO_results.log

Clear test data

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ TestDFSIO -clean 19/06/27 13:57:21 INFO fs.TestDFSIO: TestDFSIO.1.7 19/06/27 13:57:21 INFO fs.TestDFSIO: nrFiles = 1 19/06/27 13:57:21 INFO fs.TestDFSIO: nrBytes (MB) = 1.0 19/06/27 13:57:21 INFO fs.TestDFSIO: bufferSize = 1000000 19/06/27 13:57:21 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO 19/06/27 13:57:22 INFO fs.TestDFSIO: Cleaning up test files

2.nnbench nnbench is used to test the load of the NameNode. It generates a lot of HDFS-related requests and puts greater pressure on the NameNode. This test can simulate operations such as creating, reading, renaming and deleting files on HDFS.

View instructions:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ nnbench -help NameNode Benchmark 0.4 Usage: nnbench Options: -operation

NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations. -maps <number of maps. default is 1. This is not mandatory> -reduces <number of reduces. default is 1. This is not mandatory> -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory -blockSize <Block size in bytes. default is 1. This is not mandatory> -bytesToWrite <Bytes to write. default is 0. This is not mandatory> -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory> -numberOfFiles <number of files to create. default is 1. This is not mandatory> -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory> -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory> -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory> -help: Display the help statement

The test uses 10 mappers and 5 reducers to create 1000 files:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar nnbench \ -operation create_write \ -maps 10 \ -reduces 5 \ -blockSize 1 \ -bytesToWrite 0 \ -numberOfFiles 1000 \ -replicationFactorPerFile 3 \ -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-hostname

Results stored on HDFS:

mrbench mrbench will repeatedly execute a small job multiple times to check whether the operation of the small job on the cluster is repeatable and efficient.

View instructions:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ mrbench -help MRBenchmark.0.0.2 Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns < number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]

Test run a job 50 times:

hadoop jar \ /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar \ mrbench \ -numRuns 50 \ -maps 10 \ -reduces 5 \ -inputLines 10 \ -inpu

How do I execute these commands？

big-data-europe / docker-hadoop

The hdfs environment is configured, how do I test it? #111