Open mohit84 opened 1 month ago
/run regression
/run regression
/run regression
0 test(s) failed
1 test(s) generated core ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t
1 test(s) needed retry ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t
1 flaky test(s) marked as success even though they failed ./tests/000-flaky/basic_afr_split-brain-favorite-child-policy.t https://build.gluster.org/job/gh_centos7-regression/3389/
/run regression
1 test(s) failed ./tests/basic/ec/ec-badfd.t
0 test(s) generated core
3 test(s) needed retry ./tests/000-flaky/glusterd-restart-shd-mux.t ./tests/basic/afr/ta-shd.t ./tests/basic/ec/ec-badfd.t https://build.gluster.org/job/gh_centos7-regression/3390/
/run regression
1 test(s) failed ./tests/basic/ec/ec-badfd.t
0 test(s) generated core
1 test(s) needed retry ./tests/basic/ec/ec-badfd.t https://build.gluster.org/job/gh_centos7-regression/3391/
During the first rpc clnt submission we take the rpc reference and register the call_bail function for the timer thread. The timer thread call call_bail function every 10s basis. In case if a client trigger a shutdown request it try to call rpc_clnt_connection_cleanup to cleanup the rpc connection.The rpc_clnt_connection would not be able to cleanup the rpc connection successfully due to the cleanup_started flag being set by the upper xlator. The rpc reference will be unref only after trigger a call_bail function so basically if somehow call_bail is triggered just before start a shutdown process the application has to wait for 10s to cleanup the rpc connection eventually the process becomes slow.
Solution: Unref the rpc object based on the conn->timer/conn->reconnect pointer value as we are doing the same for ping_timer. These pointer are always modified under the critical section so we can assume if pointer is valid it means rpc reference is also valid.
Fixes: #4320 credits: Xavi Hernandez xhernandez@redhat.com Change-Id: Ib947b8bfcbe1b49e1ed05a50a84de6f92afbca13