Insvald / Slik

Distributed In-process Cache; a gRPC HTTP proxy for containerd
MIT License
20 stars 6 forks source link

[Discussion] Extending functionality of Silk and refactorings #14

Open sakno opened 3 years ago

sakno commented 3 years ago

Hi @Insvald , I would like to join in your project. I see that some refactoring can be applied:

Insvald commented 3 years ago

Hi,

I am more than glad to welcome you. Feel free to suggest ideas and improve areas which require additional work.

  1. Somehow I missed this membership discovery. Yes, using this interface looks more appropriate.
  2. Good point. Protobuf?
  3. Yes, I thought about this, worth start to using it.
  4. OK, I think we should start with some benchmarks/tests to measure our progress.
  5. Tests/benchmarks here too.
  6. Agreed, an interesting functionality.

I will add these points to the roadmap, thanks for the input!

sakno commented 3 years ago

About JSON format for log entries: protobuf is redundant here because any IRaftLogEntry inherits from IDataTransferObject where you can serialize and deserialize data using IAsyncBinaryWriter and IAsyncBinaryReader respectively. Both interfaces were upgraded to support fast synchronous scenarios when possible (using IBufferWriter<byte> and Span<byte>) along with traditional async methods.

sakno commented 3 years ago

Could you please add develop branch? I think it's not a good idea to make PRs directly to master.

sakno commented 3 years ago

@Insvald , after deep analysis of existing code base I found the root cause of code complexity. The main problem is routing. Slik server acts as a proxy server when the node accepted the request is not a leader node. It is trying to redirect the request to the leader. From my point of view this routing should be processed by the cache client, not by the server nodes. The main reason is redundant traffic:

The same story with the response.

I think the problem comes from the chosen architecture. There are two common approaches for a such kind of caches:

The first approach allows use to use gRPC or any other duplex protocol for communication between the clients and the grid. However, it should be wrapped into the client library. The library is responsible for caching the location of the leader, retry logic, communication with leader, receiving updates from the grid, keeping LRU cache.

The second approach doesn't require any special protocol and you can use Messaging infrastructure from DotNext.Net.Cluster for communication between nodes in the cluster.

At the moment, the current implementation trying to behave like grid. In the same time it is trying to hide the complexity from the client using proxy node.

sakno commented 3 years ago

One more thing: it is possible to combine both approaches. .NEXT Raft library provides so called standby nodes. These nodes never become leaders but participate in replication. As a result, the clients can be standby nodes and remain stateless. Their persistent WAL can be stored in ramfs or another volatile storage that can be dropped in case of failure. For instance, when you restarting POD in Kubernetes or container in Docker. In the same time the cache nodes must be stateful and participate in leader election. However, all these things leads to review of existing architecture.

Insvald commented 3 years ago

@sakno The idea behind this project was to use the second approach definitely, the client is a node itself. gRPC was added as an optional interface.

I am looking at this project as a basis for lightweight orchestration. Hopefully in-process, without any additional standalone services/nodes. In such scenario writes should be relatively rare events, I am mostly concerned with reads and ease of use for a consumer.

Nevertheless, any ideas are welcomed as at the moment I'm stuck with the containerd driver.

sakno commented 3 years ago

With the second approach we need to choose one of the followings:

The last one is possible with routing middleware shipped with DotNext.AspNetCore.Cluster library as described here. AFAIK gRPC client doesn't support transparent redirections with 302 HTTP status while REST API can do that.

Jeevananthan-23 commented 1 year ago

Hi @sakno / @Insvald, I'm really curious about that project is good to explore and understand the implementation of gRPC with Raft consensus. And sad that this project is no more active, if @sakno / @Insvald still like to work on the project and help me to implement a Lucene.net -based search engine like yelp/nrtsearch which is implemented using gRPC protobuf and some other feature for performance.

sakno commented 1 year ago

@Jeevananthan-23 , I don't own this project. However, you can use .NEXT repo to ask the question.