eWaterCycle / grpc4bmi

gRPC wrapper for model with a Basic modeling interface
https://grpc4bmi.readthedocs.io
Apache License 2.0
5 stars 4 forks source link

Sometimes BmiClientDocker construction hangs #16

Open sverhoeven opened 6 years ago

sverhoeven commented 6 years ago

When doing:

from grpc4bmi.bmi_client_docker import BmiClientDocker
model = BmiClientDocker(image='ewatercycle/walrus-grpc4bmi', image_port=55555)

Sometimes the intepreter hangs, when killed the stack trace is

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-6-b9bed7de7b73> in <module>()
      2 model = BmiClientDocker(image='ewatercycle/walrus-grpc4bmi', image_port=55555,
      3                         input_dir="./input",
----> 4                         output_dir="./output")

/usr/local/lib/python3.5/dist-packages/grpc4bmi/bmi_client_docker.py in __init__(self, image, image_port, host, input_dir, output_dir, user)
     40                                                remove=True,
     41                                                detach=True)
---> 42         super(BmiClientDocker, self).__init__(BmiClient.create_grpc_channel(port=port, host=host))
     43 
     44     def __del__(self):

/usr/local/lib/python3.5/dist-packages/grpc4bmi/bmi_grpc_client.py in __init__(self, channel, timeout, stub)
     27             self.stub = bmi_pb2_grpc.BmiServiceStub(c)
     28             future = grpc.channel_ready_future(c)
---> 29             future.result(timeout=timeout)
     30         else:
     31             self.stub = stub

/usr/local/lib/python3.5/dist-packages/grpc/_utilities.py in result(self, timeout)
    132 
    133     def result(self, timeout=None):
--> 134         self._block(timeout)
    135         return None
    136 

/usr/local/lib/python3.5/dist-packages/grpc/_utilities.py in _block(self, timeout)
     78                 else:
     79                     if until is None:
---> 80                         self._condition.wait()
     81                     else:
     82                         remaining = until - time.time()

/usr/lib/python3.5/threading.py in wait(self, timeout)
    291         try:    # restore state no matter what (e.g., KeyboardInterrupt)
    292             if timeout is None:
--> 293                 waiter.acquire()
    294                 gotit = True
    295             else:

KeyboardInterrupt: 
sverhoeven commented 6 years ago

Adding a time.sleep(1) after the container is started I was able to connect without hangs for 20+ attempts while without sleep after about 5 attempts it hung.

I think the channel_ready_future is called when the container is not yet up. Waiting a bit gives the container enough time to start up.

Sleep is a bit hacky I propose we wait until the container is running or even healthy using the healthcheck.

goord commented 6 years ago

Yeah I had a sleep there originally but I removed it to speed up the test suite.

sverhoeven commented 6 years ago

Instead of sleep used

        self.container.reload()
        while self.container.status is 'created':
            time.sleep(0.1)
            self.container.reload()

The status is running after the while loop.

Still hangups occur.

sverhoeven commented 6 years ago

When grpc.channel_ready_future is removed the I get

_Rendezvous: <_Rendezvous of RPC that terminated with:
    status = StatusCode.UNAVAILABLE
    details = "Socket closed"
    debug_error_string = "{"created":"@1536920082.125454765","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1099,"grpc_message":"Socket closed","grpc_status":14}"

on calling the initialize method.