Add a workload that measures scan-after-write consistency

ghost commented 9 years ago

Many NoSQL DBs has an eventual consistency mode for scan-after-writes, some support strong consistency for this scenario and some even support both. Consistency model clearly has an impact on performance as well.

The goal of this workload is to measure how consistent a NoSQL DB is when doing scan-after-write, expressed in terms of percentage of scans that are able to see all records that were just written.

Key application scenarios are Big Data processing scenario where producer nodes uploads a large amount of objects representing intermediate results and consumer nodes use scan to discover these objects before retrieving them. Another important scenario is log file aggregation and distribution - similar to big data scenario where consumers of these log files typically does a scan to find out new files to be collected.

@cmccoy - FYI. I'd be happy to implement this if the proposal looks good to you. @voellm - fyi.

busbey commented 9 years ago

What did you have in mind for coordinating the shared knowledge between writers and readers?

ghost commented 9 years ago

If the writers and readers are running on the same host (process), then they can simply share knowledge via memory.

If the writers and readers are running on different hosts -

Each writer can generate a manifest file of names of objects that have been successfully written to the NoSQL Store, the manifest file itself is to be named deterministically with a predefined prefix followed by the writer's ID (0 - N), and then the writers can either:

1) save the manifest file to the NoSQL store, or 2) save it locally.

Then the reader who tests scan consistency can read those manifest files first to construct the truth. Since the names of the manifest files are deterministically defined, all the reader needs to know to construct the names of the manifest file is the prefix + the number of writers. These can be passed-in to the reader as command line parameters. Then the reader can read the manifest objects from NoSQL Store or read it locally.

ghost commented 9 years ago

@cmccoy and me had a couple of discussions about this and we are in-sync with the proposed workload idea here.

We also discussed some implementation details, e.g., a potential way to simplify reader-writer communication by always having the scanner running on the same node as the writer (so they can communicate via shared memory). The drawback of this simplification is the scanner can only validate the objects written by one particular writer which is quite a bit of deviation from realistic data processing workflow where the scanner node would typically need to list objects written by all writers and then fans out the object names to the next stage of processing.

Our current thought is such simplification would only be necessary if the manifest file based idea turns out to be too complex, but we are not seeing that at the moment.

Anyway, these are just implementation details.

@busbey - do you have any further input on the design and the proposed workload idea here? I'd like to make sure we converge on the design review first before starting the implementation. Thanks!

cmccoy commented 9 years ago

A few things:

Could you specify how users will interact with this workload?
Do you need to track the entire key set written by each node, or would a range suffice?
I wonder if this will be difficult to implement in YCSB - you're simulating a Map/Reduce scenario - maybe just run Hadoop or Spark?

ghost commented 9 years ago

A typical way is: Users will specify number of writers, keys of each writer, and the launch the workload. The writers will write the keys to the DB in parallel and then a scanner will scan and validate all the required keys.
I will track the names of keys that are successfully written (so failures are not included in expectation).
I'd like to make this a generic workload to test list-after-write consistency, I don't see any reason to limit this specifically to just Hadoop or Spark, or to run those applications to test list consistency of a NoSQL DB - that (benchmarking noSQL DB) is what this tool is designed for.

ghost commented 9 years ago

Many thanks to the great inputs and suggestions from @cmccoy -

It turns out that the writers can generate key names deterministically using existing YCSB mechanism, therefore the scanner is capable of generating the expected set of key names deterministically using the same logic as the writers, however the scanner still needs to know the set of failed writes which can be communicated using manifest files as mentioned above.

In the next update, I'm going to post a verbose write-up of the design. Folks, please feel free to voice your comments here! Thanks!

ghost commented 9 years ago

Workflow: We will use the existing YCSB workflow template: Load + Run.

The load step will launch the writers and write keys to the DB concurrently, and the scan step will do the validation that keys from all successful writes during the load step are present in the scan results.

We expect the user to control the wait time (if any) between the scan step and the load step based on their expectation of the consistency level of the underlying DB. We will provide an example to show how to wrap these two steps in a simple bash script.

Keys and communication between writers and the scanner.

a) At the load step, each writer writes their keys simply following existing mechanism in YCSB:

Each writer has its own range of keys.
The keys will be deterministically hashed so they appear random to the DB under test.
There will be an overall prefix for a given test run, all keys will be written under this prefix. (currently YCSB used a hard-coded prefix for all test runs, it's "user", we can improve this behavior by allowing the user to pass a per-run prefix via a command line parameter)

This has one key advantage: Each key is generated deterministically so the scanner can use the same logic to deduce the set of keys it needs to expect for each writer - modulo any failed writes, which we will handle via a "negative manifest" as described below.

b) When a writer finishes its writes:

It records those failed writes to a manifest file (there is no need to do tenacious retry of the writes for this workload here) and writes the file to the NoSQL DB under test.
The information recorded to the file can simply be those failed keys.
This is called "negative manifest file".
The manifest files are deterministically named by the prefix of the test run + the ID of the writer, so the scanner does NOT need to rely on scan to uncover these manifest files.

c) At the run step, the user passes the number of writers, prefix of the test run, and number of keys written to the scanner, which then:

Generate the expected set of keys by running the same code writers used to generate those keys.
Read the negative manifest files from the NoSQL DB which contains list of failed keys (if any)
Remove any failed keys from the expected set of keys generated in the first step, this forms the final expectation.
Performs scan to the NoSQL DB asking for all keys under the prefix of this test run.
Compare the results of the scan and the final expectation.

This is pretty much it. It's actually a quite simple test process, but I wrote it verbosely here so we have the necessary details documented. ;)

busbey commented 8 years ago

Any update here?

ghost commented 8 years ago

It's still on my plate. I plan to work on this in March.

Sorry for the lack of update here, and thanks for the ping!

On Sunday, January 31, 2016, Sean Busbey notifications@github.com wrote:

Any update here?

— Reply to this email directly or view it on GitHub https://github.com/brianfrankcooper/YCSB/issues/416#issuecomment-177706275 .

brianfrankcooper / YCSB

Add a workload that measures scan-after-write consistency #416