Show error responses from grpc server

TribuneX commented 4 years ago

I used this lib to access a device via gNMI. Unfortunately, all error messages from the devices are ignored and cannot be accessed, since only an assertion is raised: https://github.com/danielgtaylor/python-betterproto/blob/cbd34370803a537743f6cdcadb9e5893781b137d/src/betterproto/grpc/grpclib_client.py#L79

Is there a plan for improving the error handling for error responses from grpc servers?

boukeversteegh commented 4 years ago

Sounds like an obvious and useful change. We are very limited by capacity however, so I cannot predict when someone would have time to do it.

Do you have any suggestions for what a solution could look like?

abn commented 4 years ago

Had a quick whack at reproducing it using the test suite by raising a ValueError on the server side. It seems that it got handled as expected by grpclib.

___________________________ test_simple_service_call ___________________________

    @pytest.mark.asyncio
    async def test_simple_service_call():
        async with ChannelFor([ThingService()]) as channel:
>           await _test_client(ThingServiceClient(channel))

grpc/test_grpclib_client.py:38: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
grpc/test_grpclib_client.py:19: in _test_client
    response = await client.do_thing(name=name)
output_betterproto/service/service/__init__.py:43: in do_thing
    return await self._unary_unary(
../src/betterproto/grpc/grpclib_client.py:78: in _unary_unary
    response = await stream.recv_message()
../.venv/lib64/python3.8/site-packages/grpclib/client.py:408: in recv_message
    await self.recv_initial_metadata()
../.venv/lib64/python3.8/site-packages/grpclib/client.py:376: in recv_initial_metadata
    self._raise_for_grpc_status(headers_map)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <grpclib.client.Stream object at 0x7fddc8e22400>
headers_map = {':status': '200', 'content-type': 'application/grpc+proto', 'grpc-message': 'Internal Server Error', 'grpc-status': '2'}

    def _raise_for_grpc_status(self, headers_map: Dict[str, str]) -> None:
        grpc_status = headers_map.get('grpc-status')
        if grpc_status is None:
            raise GRPCError(Status.UNKNOWN, 'Missing grpc-status header')
        try:
            status = Status(int(grpc_status))
        except ValueError:
            raise GRPCError(Status.UNKNOWN, ('Invalid grpc-status: {!r}'
                                             .format(grpc_status)))
        else:
            if status is not Status.OK:
                message = headers_map.get('grpc-message')
                if message is not None:
                    message = decode_grpc_message(message)
                details = None
                if self._status_details_codec is not None:
                    details_bin = headers_map.get(_STATUS_DETAILS_KEY)
                    if details_bin is not None:
                        details = self._status_details_codec.decode(
                            status, message,
                            decode_bin_value(details_bin.encode('ascii'))
                        )
>               raise GRPCError(status, message, details)
E               grpclib.exceptions.GRPCError: (<Status.UNKNOWN: 2>, 'Internal Server Error', None)

../.venv/lib64/python3.8/site-packages/grpclib/client.py:337: GRPCError

nvxf commented 4 years ago

I dug into the issue and found the bug and also a possible soution.

The device in question responds using multiple headers. The first set of headers indicate that the request was fine with a status code of 200, and some grpc specific headers. The second set of headers will indicate that there was a error in the request and the grpc-status with the corresponding grpc-message.

https://github.com/danielgtaylor/python-betterproto/blob/cbd34370803a537743f6cdcadb9e5893781b137d/src/betterproto/grpc/grpclib_client.py#L124

This will call recv_message() in grpclib.client which will in turn call recv_initial_metadata() in grpclib.client which is only using the first set of headers. Those headers indicate that the request itself was fine, but the reply does not contain any data, which will ultimately fail recv_message() in grpclib.stream which tries to read some metadata from the response body which is missing and None is returned, which is why the assert failes.

After reading the grpclib.client I realized that the additional metadata / headers are only used if recv_trailing_metadata() is called. But that method would only be called by recv_message() if there was a grpc-status header in the first set of headers.

I was able to fix the issue by calling both recv_initial_metadata() and recv_trailing_metadata() before the actual call to recv_message().

    async def _unary_unary(
        self,
        route: str,
        request: "IProtoMessage",
        response_type: Type[T],
        *,
        timeout: Optional[float] = None,
        deadline: Optional["Deadline"] = None,
        metadata: Optional[_MetadataLike] = None,
    ) -> T:
        """Make a unary request and return the response."""
        async with self.channel.request(
            route,
            grpclib.const.Cardinality.UNARY_UNARY,
            type(request),
            response_type,
            **self.__resolve_request_kwargs(timeout, deadline, metadata),
        ) as stream:
            await stream.send_message(request, end=True)
+           await stream.recv_initial_metadata()
+           await stream.recv_trailing_metadata()
            response = await stream.recv_message()
            assert response is not None
            return response

I did test both of the failed requests we had, and both cases are now raising a grpclib.exceptions.GRPCError as opossed to failing the assertion. I also tested a working request, which still works fine.

I am unsure if this is the correct solution. I think this might also be a bug in the device we are talking to, because the recv_trailing_metadata() would be called if the initial headers would include a grpc-status property.

boukeversteegh commented 4 years ago

~~Great job debugging that @abn 😄~~

Great job debugging that @nvxf 😄

I am unsure if this is the correct solution. I think this might also be a bug in the device we are talking to, because the recv_trailing_metadata() would be called if the initial headers would include a grpc-status property.

Is the error handling now also different/improved for the initial test that you did (with the ValueError in the backend)?

abn commented 4 years ago

Great job debugging that @abn 😄

Debugging credits go to @nvxf :)

abn commented 4 years ago

Is the error handling now also different/improved for the initial test that you did (with the ValueError in the backend)?

I think the issue is slightly different; from what @nvxf mentioned, it would seem that using stream.recv_message() and the server responds with a "trailers-only response" without setting the grpc-status header the message is not correctly retrieved and our assertion fails. Not sure if this is a grpclib issue or if we need to handle the scenario on betterproto side.

The solution proposed above will work since those are idempotent calls. However, this might only be working around the root cause.

boukeversteegh commented 4 years ago

Great job debugging that @abn 😄

Debugging credits go to @nvxf :)

oops! my bad! thanks @nvxf for looking into this!

abn commented 4 years ago

@boukeversteegh so, it seems we were asserting for response data a bit too eagerly without giving the trailing metadata handling a chance to take place. I have proposed a fix in #127.

boukeversteegh commented 4 years ago

Fixed by @abn in #127

danielgtaylor / python-betterproto

Show error responses from grpc server #123