apache / incubator-pegasus

Apache Pegasus - A horizontally scalable, strongly consistent and high-performance key-value store
https://pegasus.apache.org/
Apache License 2.0
1.99k stars 312 forks source link

fix(go-client): update config once replica server failed and forward to primary meta server if it was changed #1909

Closed lengyuexuexuan closed 9 months ago

lengyuexuexuan commented 9 months ago

What problem does this PR solve?

1880

1856

What is changed and how does it work?

As for #1856. when go client is writing to one partition and the replica node core dump, go client will finish after timeout without updating the configuration. In this case, the go client only restart to solve the problem.

In this pr, the client would update conconfiguration of table automatically when someone replica core dump. After testing, we found that the the replicaerror is "context.DeadlineExceeded" when the replica core dump.

https://github.com/apache/incubator-pegasus/blob/41141c11c36930a19da727fd25a4876bd56f76a6/go-client/pegasus/table_connector.go#L705-L706

Therefore, when client meets the errror, the go client will update configuration automatically. Besides, this request will not retry. Because only in the case of timeout, the configuration will be automatically updated. If you try again before then, it will still fail. There is also the risk of infinite retries. Therefore, it is better to directly return the request error to the user and let the user try again.

As for #1880 When the client sends an RPC message "RPC_CM_QUERY_PARTITION_CONFIG_BY_INDEX" to the meta server, if the meta server isn't primary, the response that forward to the primary meta server will return.

According to the above description, assuming that the client does not have a primary meta server configured, we can connect to the primary meta server in this way.

In this PR, we implement this function through the following steps.

  1. First parse the response, determine whether its errno is ERR_FORWARD_TO_OTHERS, and then parse it to get the primary meta server address.
    https://github.com/apache/incubator-pegasus/blob/41141c11c36930a19da727fd25a4876bd56f76a6/go-client/session/meta_call.go#L166-L177
  2. Secondly, determine whether the address is already in the client configuration. If it is already there, skip it directly. Otherwise, establish a connection and pull the configuration directly from the primary meta server. https://github.com/apache/incubator-pegasus/blob/41141c11c36930a19da727fd25a4876bd56f76a6/go-client/session/meta_call.go#L118-L138

It should be noted that the IP address and session do not have a one-to-one correspondence, because there may be situations where the IP address is unavailable. This is why there is a priamry meta server configuration in the client, but the curllead cannot be used as the index of the metaIPAddrs array. https://github.com/apache/incubator-pegasus/blob/41141c11c36930a19da727fd25a4876bd56f76a6/go-client/session/meta_call.go#L123-L128

Tests
acelyc111 commented 9 months ago

@lengyuexuexuan It would be better to separate the bugfix and the refactor of generation thrift files to 2 pull requests.

lengyuexuexuan commented 9 months ago

@lengyuexuexuan It would be better to separate the bugfix and the refactor of generation thrift files to 2 pull requests.

ok.