containerd / nri

Node Resource Interface
Apache License 2.0
220 stars 58 forks source link

Should the connection be actively closed after triggering the "plugin_request_timeout" timeout? #90

Open zhaodiaoer opened 2 weeks ago

zhaodiaoer commented 2 weeks ago

In the current logical design, when the external plugin responds to the request timeout (exceeding the configured value of plugin_request_timeout), it will cause the corresponding stub function on the adaptation side to produce an DeadlineExceeded error, and the error is currently regarded as a "FatalError", which further leads to the adaptation side (such as containerd) actively closing the ttrpc connection, the DeadlineExceeded error is also hidden, and the function will return nil error and log out the error. Once the above timeout occurs, the plugin side can only re-establish the connection(or do some similar work in onClose()?). Combined with the problem mentioned in the issue, if the plugin does not provide an onClose() handler, the plugin will be quietly exited directly. In view of this, I want to discuss a question: whether we should optimize the timeout handling logic of the stub, such as providing configurable options to decide whether to keep the ttrpc connection when a timeout occurs, or other more complete optimization ideas. Please also let me know what considerations and backgrounds there are for such a design at present. Thank you!