GoogleCloudPlatform / grpc-gcp-go

gRPC support for Google Cloud Clients.
https://godoc.org/github.com/GoogleCloudPlatform/grpc-gcp-go
Apache License 2.0
26 stars 20 forks source link

A BIND RPC should sometimes force the creation of a new subconn #19

Open olavloite opened 5 years ago

olavloite commented 5 years ago

This issue is submitted as a result of a conversation on https://code-review.googlesource.com/c/gocloud/+/43910 with @WeiranFang.

The performance (especially latency) of Cloud Spanner relies on Spanner sessions being affiliated with a gRPC channel during their lifetimes. All RPC's referencing a session should be executed on the same gRPC channel as the channel that was used to create the session.

The Spanner client library contains a session pool that automatically creates and maintains sessions for the user. This pool currently handles the channel affiliation manually by holding a reference to the gRPC channel that was used to create it in a field in the session struct. This channel is used for all subsequent calls that reference the session. This manual handling could be avoided when using grpc-gcp by defining the create session RPC's as BIND RPC's.

The Spanner client lib will initialize a session pool with a configured number of sessions (default 100) when a user creates a Spanner Client. The initialization of the session pool itself will not cause a load that would normally require more than one gRPC channel, but the created sessions should be evenly distributed over all available channels.

One possible solution for this could be to force each BIND RPC to create a new channel. This would unfortunately still not be 100% compatible with the current session pool implementation for the following reasons:

  1. Pool initialization is done using the BatchCreateSessions RPC of Spanner. Assuming that we define this as a BIND RPC, and the user has configured 4 channels (subconns) and a minimum of 1000 sessions, the pool will execute 4 RPC's each requesting 250 sessions. Spanner does however not guarantee that it will return the number of requested sessions, and the current behavior is that it will return at most 100 sessions per batch. If the pool receives less than the requested number of sessions, it will do a second RPC for the remaining sessions, and repeat this until all the necessary sessions have been created. These subsequent RPC's should be done on the same subconn as the original RPC.

  2. The session pool may need to create additional sessions at a later time, if the initial number of sessions in the pool is not enough. It will currently do so using the CreateSession RPC of Spanner. This should also not create a new subconn, but it should be a BIND RPC. Instead, these calls should also be round-robin distributed over the available subconns.

Please feel free to correct any of my assumptions about the working of grpc-gcp that may be wrong :)

WeiranFang commented 5 years ago

Thanks a lot for the detailed information! I may not consider all the cases thoroughly, but just based on my understanding of the batch create session, it sounds like we would need a "batch bind" support where on the completion of each batch request, it would bind all sessions from the returned session list to the connection that serves this request. And the retry behavior mentioned in (1) won't be affected because the it sounds like the client will record how many sessions it hasn't yet created, and the client will re-call batch create session with this remaining number.

For (2)nd case, if the client calls any future "createSession", it will still call the same "bind" function on request completion. And it won't necessarily create new subconn because in grpcgcp we have this "maxSize" that defines the upper bound of the connection pool size. This will always be a hard limit no matter what type (bind/unbind/bound) of request the client sends out.

So the new BIND logic would be like: pick new connection if the capacity is less than "maxSize", if not, use the least busy one (least concurrent streams). Once the request is completed, bind all the returned sessions (single or multiple, depends on whether is a createSession or batchCreateSession) to this connection.

Again, I may not be thinking the problem thoroughly, so please correct me if I'm wrong. And since this sounds like a language-agnostic requirement, I will draft a spec for the general grpcgcp (multiple languages) to discuss this problem. Very appreciate your input!

olavloite commented 5 years ago

it sounds like we would need a "batch bind" support where on the completion of each batch request, it would bind all sessions from the returned session list to the connection that serves this request.

Agree.

And the retry behavior mentioned in (1) won't be affected because the it sounds like the client will record how many sessions it hasn't yet created, and the client will re-call batch create session with this remaining number.

Agree.

For (2)nd case, if the client calls any future "createSession", it will still call the same "bind" function on request completion. And it won't necessarily create new subconn because in grpcgcp we have this "maxSize" that defines the upper bound of the connection pool size. This will always be a hard limit no matter what type (bind/unbind/bound) of request the client sends out.

Sounds good.

So the new BIND logic would be like: pick new connection if the capacity is less than "maxSize", if not, use the least busy one (least concurrent streams). Once the request is completed, bind all the returned sessions (single or multiple, depends on whether is a createSession or batchCreateSession) to this connection.

Agree. AFAICT this should be enough to support the needs for the client lib and session pool for Spanner.