GreptimeTeam / greptimedb

An open-source, cloud-native, unified time series database for metrics, logs and events with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4.34k stars 314 forks source link

Tidy up all timeout config places, make them set from some static sources #1935

Closed WenyXu closed 1 year ago

WenyXu commented 1 year ago

What type of enhancement is this?

Configuration, Tech debt reduction

What does the enhancement do?

In #1861, #1901, and #1878, We introduce a lot of hard code timeout configurations.

e.g.,

 timeout(
      // TODO(weny): makes timeout configurable.
          Duration::from_secs(10),
          self.meta_client.submit_ddl_task(request),
      )
      .await
      .context(error::TimeoutSnafu)?
      .context(error::RequestMetaSnafu)

It's better to make it configurable for each request.

i.g.,

let mut alter_request = AlterRequest{ .. };
alter_request.timeout = Some(Duration::from_secs(10));

client.alter(alter_request).await;

Implementation challenges

No response

waynexia commented 1 year ago

maybe a good first issue?

MichaelScofield commented 1 year ago

Actually we need to tidy up all the places that use all sorts of timeouts, and set them from some static config sources.

fengjiachun commented 1 year ago

We seem unable to set a longer timeout for one of the requests individually, as the internal implementation will let the smaller value win.

https://github.com/hyperium/tonic/blob/master/tonic/src/transport/service/grpc_timeout.rs#L49

MichaelScofield commented 1 year ago

After some digging into tonic's code, I found there are two timeouts: tonic's timeout, as an "interceptor", see struct GrpcTimeout in tonic. It's using something like tokio's timeout underneath. Another is GRPC's request timeout defined by GRPC protocol, set in request header. We are not using the latter one (grpc request timeout), so tonic only uses the timeout we set to endpoint. I think we are doing it right.

As to our "timeout seems not working problem", I found that during testing, datanode's grpc client is created in async fn create_datanode_client(datanode_instance: Arc<DatanodeInstance>) -> (String, Client), which is indeed using the default timeout. However, the default timeout is set to 10s yesterday, so I think the problem should be gone.

fengjiachun commented 1 year ago

Looks like this issue has been solved, can we close it? @MichaelScofield @WenyXu

MichaelScofield commented 1 year ago

No, we have to go through our codes to see all the timeouts are or could be properly set from some config files.