What are the performance goals with this?

explorer14 commented 5 years ago

hey Craig,

Good effort as usual!

Could you share what were the performance stats before and after this change was implemented? In other words, can you quantify what does "performing badly" really mean? Redis' memory consumption is only really limited by the available system memory which will ofcourse depend on whether the host is 32-bit or 64-bit. Perhaps its a design smell that you are having to store very large JSON blobs in a key value store like Redis as opposed to a dedicated persistent document store like Dynamo DB or Cosmos DB. But then again a little more context with some stats than what's on your blog would help me understand this a bit better.

You could also look at ReJSON which is a new module for Redis that supports JSON natively (internally it uses binary format to store the JSON data more effectively): https://oss.redislabs.com/redisjson/.

Cheers Aman

craigwardman commented 5 years ago

Thanks for the question Aman.

This client doesn't attempt to solve a particular use case, but there are certainly cases where you want to store either "state" or "cached data" out of process for a finite amount of time without committing to persistent stores.

Redis serves this purpose well, since its out of process and supports cleaning up it's own stale data. One of the things I have implemented here is a sliding expiration, so that continually reading some data will keep that data alive longer. Older pieces of data will eventually drop off the system with no cleanup necessary, as would be required for a persistent store.

In terms of performance and what is meant by "large" - this is more about throughput than system limits based on available memory - i.e. it's doesn't affect the upper the limits of your Redis instance, but the throughput of dealing with values concurrently.

Actually when I performance tested this using normal "test conditions" (e.g. just pass various sizes of values to Redis) the "normal" version of the client performed better (since avoiding chunking/locking etc is obviously less overhead) - however under more realistic workload "stress tests" the "chunked" version started to perform better (presumably this boils down to the way in which the command pipeline works within the connection multiplexer, preferring smaller/faster pipeline rather than chunky ones which can cause it to block or timeout).

Given 100 parallel workloads of creating, getting, updating and then deleting some data from Redis (with an average payload size of around ~60kb) here were the results:

[Using local VM for Redis with 2 vCPU, 2GB RAM]


NORMAL:
     Mean |    Error |   StdDev |      Min |        Max |   Median |
---------:|---------:|---------:|---------:|-----------:|---------:|
 761.2 ms | 55.04 ms | 162.3 ms | 284.2 ms | 1,064.4 ms | 790.9 ms |

CHUNKED:
     Mean |    Error |   StdDev |      Min |      Max |   Median |
---------:|---------:|---------:|---------:|---------:|---------:|
 305.5 ms | 6.091 ms | 14.71 ms | 274.7 ms | 352.6 ms | 304.7 ms |

explorer14 commented 5 years ago

Interesting! Cheers for sharing the insight. I've got some more stuff for you then:

Suggestions:

A quick "how to" on the README.MD will help people bootstrap the library easier in their own code.
Consider adding tests because they are a great way to execute the library code quickly and see how it works without hooking up a real application to it.
Constructing an instance of RedisClient is rather complicated for a developer that's working with the library for the first time. Perhaps consider adding convenient extensions or fluent builders or factories to construct an instance more easily. I can also make PRs for this as I find some time by the way.
Put any performance benchmarks along with the code in the repo, best in tests.

Questions:

How does KeyGen work? Whose responsbility is it to generate keys?
Have you considered targetting .NET Standard 2.0 so it can work with .NET Framework just as well with .NET Core?
Have you tested this with a multi-instance redis cluster? If so, any problems you came across with regards to locking and chunking? Distributed locking is a complex problem and its even more complex to build a good lock service. What are your thoughts on this?
What's the inflexion point at which you've found chunking to be a significantly better approach than non-chunked one? In other words, do you think the difference is noticeable enough for the users of such an application?

Sorry, just curious about this stuff because generally the default libraries do a pretty good job for more or less 95% of the cases, which leaves a very niche chunk (pun intended! ;) ) that can benefit from serious performance extensions for e.g. ingestion of very large volumes of big data like IoT or real time streaming.

craigwardman commented 5 years ago

I'll hopefully get around to writing a quick start guide at some point, I was hoping the existing ReadMe was good enough as is.

To exemplify it a little further (and also make easier for Microsoft DI consumers) I've added an extension method for IServiceCollection to automatically wire-up the components (whether you're using the chunking version or not).

While I was there I changed the targeting to .NET standard :)

I should add tests I suppose, I just never got around to that either but feel free if you have time. (I used Benchmark.Net to generate the stats, but in a throwaway fashion).

With regards to KeyGen - Basically all that is is a way of turning any class into something that can be used as a key in Redis (a "serializer" if you will). For Guid I can provide this as I know how to create a globally unique key from that type. Any other type would be up to the consumer to define.

For scaled environments this is going to need something like RedLock rather than the in-memory lock which is here by default. Hopefully someone with more time than me can also pick that up!

explorer14 commented 5 years ago

A good readme is always welcome by people looking to play with your SDKs, I will see if I can contribute to it. I like the IServiceCollection extension, very convenient if you are using MS DI. 👍 for migrating it to .NET Std. Distributed lock is an interesting yet complex problem and is better left until there is a need for it (atleast for me anyway, something tells me it will take time to get it right and reliable).

As for the tests, I have opened up a PR for you have a look at: https://github.com/craigwardman/ChunkingRedisClient/pull/4. Let me know what you think of it.

craigwardman / ChunkingRedisClient

What are the performance goals with this? #3