Create a split scaling test

The changes in elasticity make it feasible for a small cluster to have a large number of tablets. A test that makes it easy to repeatedly run a split scaling test on a small accumulo cluster and grab metrics would be useful. This test would not seek to generate large amounts of data, but would generate lots of splits from a small amount of data and time. This will test the Accumulo's ability to handle increasing amount of metadata operations. The test script could use continuous ingest to create the initial data. It could then run a configured number of rounds, reducing the split threshhold for each round and waiting for all splits to happen. This test could added to accumulo-testing

Test parameter	Description
tables	The number of test tables to create
initial data	The initial amount of data to insert into each table
initial splits	The initial number of splits to create for each table
initial split threshold	The initial split threshold that each table will be configured with
split threshold reduction factor	The amount the split threshold will be reduced for each round of the test. For example if this is set to 10, it reduce the split threshold by a factor of 10 for each test round. If the initial split threshold was 1G, then it would set it to 1G/10=100M for the first test round
test rounds	the number of test rounds to run

For example with the following test config

Test parameter	Description
tables	10
initial data	10M continuous ingest entries
initial splits	0
initial split threshold	1G
split threshold reduction factor	10
test rounds	3

The test script would do the following

create 10 tables with 0 splits
ingest 10M entries into each table
wait until tablets in all tables are below split threshold
start round 1 by reducing the split threshold from 1G to 100M on each table
wait until tablets in all tables are below split threshold and print the time it took since step 3.. maybe print other info like the number of tablets
start round 2 by reducing the split threshold from 100M to 10M on each table
wait until tablets in all tables are below split threshold and print the time it took since step 5
start round 3 by reducing the split threshold from 10M to 1M on each table
wait until tablets in all tables are below split threshold and print the time it took since step 7
finish test and verify the data on each table to ensure its correct

Once we have this script we can run it stress the code in elasticity and try to understand how well its doing and what the bottlenecks are. Can also examine metrics and logs from running the test and look for improvements in those.

apache / accumulo-testing

Create a split scaling test #266