awslabs / dynein

DynamoDB CLI written in Rust.
https://github.com/awslabs/dynein
Apache License 2.0
360 stars 37 forks source link

Create a load testing command #135

Open StoneDot opened 1 year ago

StoneDot commented 1 year ago

Load testing command

Background

The load testing command is useful in understanding DynamoDB behaviors, for example, throttling, auto-scaling, metrics, etc. Also, it helps users to investigate an application's behavior when throttling happens.

Proposed design

The decisions in the implementations are the followings;

Interface

At first implementation, load testing functionality is provided with the command, dy bench run or dy benchmark run and provided options are the following;

Common options like --table, --region, etc are considered as well as other commands.

We use a bench run subcommand for initial implementation. Please note that we have room of feature enhancements. For example, we can use dy bench run -s <scenario-file> for scenario based tests and dy bench report <report-file> for showing a result of a test.

The workflow

The workflow of the load testing is schematically described as the followings;

  1. Based on the --item-variations argument, create a list of primary keys to use in the test. In the case in which --skip-item-creation is provided, Scan APIs are invoked to list primary keys. We must use parallel scans because sequential scans create a hot partition.
  2. Based on the --wcu argument, PutItem are invoked with the primary keys created by the first step for the duration of --duration-write. An item created has an additional string attribute with --size bytes.
  3. Based on the --rcu argument, GetItem are invoked with the primary keys created by the first step for the duration of --duration-read.
ryota-sakamoto commented 1 year ago

Thank you for creating proposal of great feature.

I think other command have followed the format like dy <verb> or dy <command> <verb> in general. What kind of other sub command do you have rather than simple?

StoneDot commented 1 year ago

I think other command have followed the format like dy or dy in general. What kind of other sub command do you have rather than simple?

I have some ideas regarding scenario base benchmarking. I suppose it will be invoked by dy benchmark scenario command. Its command style is the same as dy admin create table. I understand that it is a little awkward as an English phrase, but I feel dy benchmark table simply is a little verbose. I am willing to take in good suggestions for the command name.

StoneDot commented 1 year ago

I mention the YCSB command style as an option. I think it will be dy benchmark load to load the data and dy benchmark run to run the workload if we implement its style in dynein. The pros are compatibility with YCSB, and the cons are that we should separately run loading and testing. But I prefer dy benchmark simple.

ryota-sakamoto commented 1 year ago

I think we need to provide some command like show result of load testing. I'm not sure how to run scenario base test for now. But I have two ideas that we provide simple test and scenario base test.

all in one

The idea is that we can run simple test and scenario base test within one command. If we specify the test file to run scenario base, I can imagine kind of command as follows. It is just simple interface.

# simple test
$ dy load run --rcu 100 --wcu 5

# scenario base test
$ dy load run -s <scenario-file>

# show result of load test
$ dy load report <report-file>

split command

The idea is that we provide two command load and benchmark. The role of each command is clearly.

# simple test
$ dy load run --rcu 100 --wcu 5
$ dy load report <report-file>
# scenario base test
$ dy benchmark run <scenario-file>
$ dy benchmark report <report-file>
StoneDot commented 1 year ago

I personally find the -s option to be a clear and effective way of specifying scenario-based testing. Also, it makes sense to split the run and report commands. Thank you for your suggestion. However, I'm a bit concerned that the load argument might confuse users since it has multiple meanings. In other words, I worry that users might mix up loading the data and loading DynamoDB for stress testing.

In my opinion, using the term benchmark (maybe even a shorter version like bench) would be clearer than load. What do you think?

Additionally, I would like to propose the following commands:

# Perform a simple test
$ dy bench run --rcu 100 --wcu 5

# Conduct a scenario-based test (not implemented in the initial phase)
$ dy bench run -s <scenario-file>

# Generate a report for the load test (not implemented in the initial phase)
$ dy bench report <report-file>

Please let me know what you think about these suggestions and proposed commands.

ryota-sakamoto commented 1 year ago

I agree with you. The idea that using benchmark or bench instead of load is clearly and easy to understand.

StoneDot commented 11 months ago

Based on the internal discussion with Solution Architect, the following features are preferable.

He want similar functionality as what the following project provides. https://github.com/aws-samples/dynamodb-consumed-capacity-check-tool