dotnet / dotNext

Next generation API for .NET
https://dotnet.github.io/dotNext/
MIT License
1.64k stars 123 forks source link

Optimization #78

Closed mbasij1 closed 2 years ago

mbasij1 commented 3 years ago

For best performance, we use TcpConfiguration and create a simple k/v Store, that uses a simple dictionary. And not a configuration snapshot in PersistentState. we found that the optimistic way to create log entries is using fixed struct size like example. but I get 136ms avg latency that is very high for me. SqlServer instance give me 40ms avg latency for the same job! and we need 1-3ms avg latency. I wanna know we could reach 3ms with this library or should use other libraries?

sakno commented 3 years ago

The library offers many tools for optimization, but this is responsibility of the application.

  1. Measure your network latency first. The library can't work faster than the underlying network.
  2. Client-server interaction performance is not covered by the library. To be more precise, the library doesn't dictate the protocol for the interaction. TCP here is just a transport between cluster nodes, not between the node and the client. There are plenty of choices: your custom application-level protocol on top of TCP, gRPC with Protobuf, MessagePack
  3. On write requests, it's recommended to force replication immediately using IRaftCluster.ReplicateAsync method.
  4. On read requests, choose appropriate consistency model: weak or strong consistency. Weak consistency is more performant but doesn't provide linearizability and allow stale reads from follower nodes
  5. Use leader lease for linearizable reads. But make sure about careful choice of clock drift bound
  6. Tune persistent WAL accordingly: enable in-memory caching of uncommitted log entries, choose the best log compaction strategy according with your benchmarks
  7. Tune your implementation of IRaftLogEntry if you're not using Interpreter Framework. If Interpreter Framework is utilized, check serialization/deserialization performance inside of each command type implementing ISerializable<TSelf> interface.
  8. Check your implementation of the state machine. Try to perform reads from the memory cache.
  9. As a last resort, you can provide your own implementation of IPersistentState interface.

Full spectrum of optimization techniques greatly described here and here.

sakno commented 3 years ago

The correct picture of the performance include the measurements of each step of interaction:

  1. Underlying network latency, in ms
  2. Overhead of client interaction, in ms
  3. Overhead of state machine, in ms
  4. Overhead of appending log entries to WAL when the replication is forced, in ms
  5. Overhead of replication itself, in ms
  6. Overhead of serialization/deserialization, in ms

Having that picture, you're able to perform correct decision about performance optimizations.

sakno commented 2 years ago

Release 4.2.0 includes many optimizations aimed to improve response time.

Replication of a single log entry of size 8 Kb takes ~10,7 ms for a single entry with the following settings

Replication of multiple log entries of size 8 Kb takes ~6,28 ms for a single entry with the settings mentioned above.

Disk write requires about 6 ms so I don't see any way to reduce it up to 1 ms.