Closed phudtran closed 2 years ago
Hi Phu,
I tried the same test again. TP1 works fine, but TP2 is missing eps host. I got same error in operator log:
2022-02-17T20:05:15.278456383Z stderr F [2022-02-17 20:05:15,276] luigi-interface [ERROR ] [pid 7] Worker Worker(salt=857259027, workers=1, host=ip-172-30-0-156, username=root, pid=7) failed NetCreate(param=<mizar.common.wf_param.HandlerParam object at 0x7f81d477b250>)
2022-02-17T20:05:15.278485009Z stderr F Traceback (most recent call last):
2022-02-17T20:05:15.278493113Z stderr F File "/usr/local/lib/python3.9/site-packages/luigi/worker.py", line 199, in run
2022-02-17T20:05:15.278499431Z stderr F new_deps = self._run_get_new_deps()
2022-02-17T20:05:15.27850451Z stderr F File "/usr/local/lib/python3.9/site-packages/luigi/worker.py", line 141, in _run_get_new_deps
2022-02-17T20:05:15.278510031Z stderr F task_gen = self.task.run()
2022-02-17T20:05:15.278515828Z stderr F File "/usr/local/lib/python3.9/site-packages/mizar/dp/mizar/workflows/nets/create.py", line 76, in run
2022-02-17T20:05:15.278521153Z stderr F droplet.interfaces = endpoints_opr.init_host_endpoint_interfaces(
2022-02-17T20:05:15.278527297Z stderr F File "/usr/local/lib/python3.9/site-packages/mizar/dp/mizar/operators/endpoints/endpoints_operator.py", line 463, in init_host_endpoint_interfaces
2022-02-17T20:05:15.278533697Z stderr F return InterfaceServiceClient(droplet.main_ip).InitializeInterfaces(interfaces)
2022-02-17T20:05:15.278539027Z stderr F File "/usr/local/lib/python3.9/site-packages/mizar/daemon/interface_service.py", line 319, in InitializeInterfaces
2022-02-17T20:05:15.278575591Z stderr F resp = self.stub.InitializeInterfaces(interfaces_list)
2022-02-17T20:05:15.278583231Z stderr F File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__
2022-02-17T20:05:15.278588463Z stderr F return _end_unary_response_blocking(state, call, False, None)
2022-02-17T20:05:15.278593588Z stderr F File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
2022-02-17T20:05:15.278598731Z stderr F raise _InactiveRpcError(state)
2022-02-17T20:05:15.278603842Z stderr F grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
2022-02-17T20:05:15.278610581Z stderr F status = StatusCode.UNAVAILABLE
2022-02-17T20:05:15.278615824Z stderr F details = "failed to connect to all addresses"
2022-02-17T20:05:15.278621718Z stderr F debug_error_string = "{"created":"@1645128315.275418059","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3134,"referenced_errors":[{"created":"@1645128315.275416392","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
Hi Phu,
I tried the same test again. TP1 works fine, but TP2 is missing eps host. I got same error in operator log:
2022-02-17T20:05:15.278456383Z stderr F [2022-02-17 20:05:15,276] luigi-interface [ERROR ] [pid 7] Worker Worker(salt=857259027, workers=1, host=ip-172-30-0-156, username=root, pid=7) failed NetCreate(param=<mizar.common.wf_param.HandlerParam object at 0x7f81d477b250>) 2022-02-17T20:05:15.278485009Z stderr F Traceback (most recent call last): 2022-02-17T20:05:15.278493113Z stderr F File "/usr/local/lib/python3.9/site-packages/luigi/worker.py", line 199, in run 2022-02-17T20:05:15.278499431Z stderr F new_deps = self._run_get_new_deps() 2022-02-17T20:05:15.27850451Z stderr F File "/usr/local/lib/python3.9/site-packages/luigi/worker.py", line 141, in _run_get_new_deps 2022-02-17T20:05:15.278510031Z stderr F task_gen = self.task.run() 2022-02-17T20:05:15.278515828Z stderr F File "/usr/local/lib/python3.9/site-packages/mizar/dp/mizar/workflows/nets/create.py", line 76, in run 2022-02-17T20:05:15.278521153Z stderr F droplet.interfaces = endpoints_opr.init_host_endpoint_interfaces( 2022-02-17T20:05:15.278527297Z stderr F File "/usr/local/lib/python3.9/site-packages/mizar/dp/mizar/operators/endpoints/endpoints_operator.py", line 463, in init_host_endpoint_interfaces 2022-02-17T20:05:15.278533697Z stderr F return InterfaceServiceClient(droplet.main_ip).InitializeInterfaces(interfaces) 2022-02-17T20:05:15.278539027Z stderr F File "/usr/local/lib/python3.9/site-packages/mizar/daemon/interface_service.py", line 319, in InitializeInterfaces 2022-02-17T20:05:15.278575591Z stderr F resp = self.stub.InitializeInterfaces(interfaces_list) 2022-02-17T20:05:15.278583231Z stderr F File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 946, in __call__ 2022-02-17T20:05:15.278588463Z stderr F return _end_unary_response_blocking(state, call, False, None) 2022-02-17T20:05:15.278593588Z stderr F File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking 2022-02-17T20:05:15.278598731Z stderr F raise _InactiveRpcError(state) 2022-02-17T20:05:15.278603842Z stderr F grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: 2022-02-17T20:05:15.278610581Z stderr F status = StatusCode.UNAVAILABLE 2022-02-17T20:05:15.278615824Z stderr F details = "failed to connect to all addresses" 2022-02-17T20:05:15.278621718Z stderr F debug_error_string = "{"created":"@1645128315.275418059","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3134,"referenced_errors":[{"created":"@1645128315.275416392","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}"
Do you see this line after that error? "Daemon not yet ready for droplet some_ip_here" The operator should retry, until eventually it creates the host endpoint once the daemon is up. If the host endpoint never comes up, there may be another issue.
This PR adds a retry to host endpoint creation when the subnet comes up. Fixes an issue where operator tries to create a host endpoint before the daemon is up.