The changes in elasticity make it feasible for a small cluster to have a large number of tablets. A test that makes it easy to repeatedly run a split scaling test on a small accumulo cluster and grab metrics would be useful. This test would not seek to generate large amounts of data, but would generate lots of splits from a small amount of data and time. This will test the Accumulo's ability to handle increasing amount of metadata operations. The test script could use continuous ingest to create the initial data. It could then run a configured number of rounds, reducing the split threshhold for each round and waiting for all splits to happen. This test could added to accumulo-testing
Test parameter
Description
tables
The number of test tables to create
initial data
The initial amount of data to insert into each table
initial splits
The initial number of splits to create for each table
initial split threshold
The initial split threshold that each table will be configured with
split threshold reduction factor
The amount the split threshold will be reduced for each round of the test. For example if this is set to 10, it reduce the split threshold by a factor of 10 for each test round. If the initial split threshold was 1G, then it would set it to 1G/10=100M for the first test round
test rounds
the number of test rounds to run
For example with the following test config
Test parameter
Description
tables
10
initial data
10M continuous ingest entries
initial splits
0
initial split threshold
1G
split threshold reduction factor
10
test rounds
3
The test script would do the following
create 10 tables with 0 splits
ingest 10M entries into each table
wait until tablets in all tables are below split threshold
start round 1 by reducing the split threshold from 1G to 100M on each table
wait until tablets in all tables are below split threshold and print the time it took since step 3.. maybe print other info like the number of tablets
start round 2 by reducing the split threshold from 100M to 10M on each table
wait until tablets in all tables are below split threshold and print the time it took since step 5
start round 3 by reducing the split threshold from 10M to 1M on each table
wait until tablets in all tables are below split threshold and print the time it took since step 7
finish test and verify the data on each table to ensure its correct
Once we have this script we can run it stress the code in elasticity and try to understand how well its doing and what the bottlenecks are. Can also examine metrics and logs from running the test and look for improvements in those.
The changes in elasticity make it feasible for a small cluster to have a large number of tablets. A test that makes it easy to repeatedly run a split scaling test on a small accumulo cluster and grab metrics would be useful. This test would not seek to generate large amounts of data, but would generate lots of splits from a small amount of data and time. This will test the Accumulo's ability to handle increasing amount of metadata operations. The test script could use continuous ingest to create the initial data. It could then run a configured number of rounds, reducing the split threshhold for each round and waiting for all splits to happen. This test could added to accumulo-testing
For example with the following test config
The test script would do the following
Once we have this script we can run it stress the code in elasticity and try to understand how well its doing and what the bottlenecks are. Can also examine metrics and logs from running the test and look for improvements in those.