Closed TribuneX closed 4 years ago
Sounds like an obvious and useful change. We are very limited by capacity however, so I cannot predict when someone would have time to do it.
Do you have any suggestions for what a solution could look like?
Had a quick whack at reproducing it using the test suite by raising a ValueError
on the server side. It seems that it got handled as expected by grpclib.
___________________________ test_simple_service_call ___________________________
@pytest.mark.asyncio
async def test_simple_service_call():
async with ChannelFor([ThingService()]) as channel:
> await _test_client(ThingServiceClient(channel))
grpc/test_grpclib_client.py:38:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
grpc/test_grpclib_client.py:19: in _test_client
response = await client.do_thing(name=name)
output_betterproto/service/service/__init__.py:43: in do_thing
return await self._unary_unary(
../src/betterproto/grpc/grpclib_client.py:78: in _unary_unary
response = await stream.recv_message()
../.venv/lib64/python3.8/site-packages/grpclib/client.py:408: in recv_message
await self.recv_initial_metadata()
../.venv/lib64/python3.8/site-packages/grpclib/client.py:376: in recv_initial_metadata
self._raise_for_grpc_status(headers_map)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <grpclib.client.Stream object at 0x7fddc8e22400>
headers_map = {':status': '200', 'content-type': 'application/grpc+proto', 'grpc-message': 'Internal Server Error', 'grpc-status': '2'}
def _raise_for_grpc_status(self, headers_map: Dict[str, str]) -> None:
grpc_status = headers_map.get('grpc-status')
if grpc_status is None:
raise GRPCError(Status.UNKNOWN, 'Missing grpc-status header')
try:
status = Status(int(grpc_status))
except ValueError:
raise GRPCError(Status.UNKNOWN, ('Invalid grpc-status: {!r}'
.format(grpc_status)))
else:
if status is not Status.OK:
message = headers_map.get('grpc-message')
if message is not None:
message = decode_grpc_message(message)
details = None
if self._status_details_codec is not None:
details_bin = headers_map.get(_STATUS_DETAILS_KEY)
if details_bin is not None:
details = self._status_details_codec.decode(
status, message,
decode_bin_value(details_bin.encode('ascii'))
)
> raise GRPCError(status, message, details)
E grpclib.exceptions.GRPCError: (<Status.UNKNOWN: 2>, 'Internal Server Error', None)
../.venv/lib64/python3.8/site-packages/grpclib/client.py:337: GRPCError
I dug into the issue and found the bug and also a possible soution.
The device in question responds using multiple headers. The first set of headers indicate that the request was fine with a status code of 200, and some grpc specific headers. The second set of headers will indicate that there was a error in the request and the grpc-status with the corresponding grpc-message.
This will call recv_message()
in grpclib.client
which will in turn call
recv_initial_metadata()
in grpclib.client
which is only using the first set
of headers. Those headers indicate that the request itself was fine, but the
reply does not contain any data, which will ultimately fail recv_message()
in
grpclib.stream
which tries to read some metadata from the response body which
is missing and None
is returned, which is why the assert failes.
After reading the grpclib.client
I realized that the additional metadata /
headers are only used if recv_trailing_metadata()
is called.
But that method would only be called by recv_message()
if there was a
grpc-status
header in the first set of headers.
I was able to fix the issue by calling both recv_initial_metadata()
and
recv_trailing_metadata()
before the actual call to recv_message()
.
async def _unary_unary(
self,
route: str,
request: "IProtoMessage",
response_type: Type[T],
*,
timeout: Optional[float] = None,
deadline: Optional["Deadline"] = None,
metadata: Optional[_MetadataLike] = None,
) -> T:
"""Make a unary request and return the response."""
async with self.channel.request(
route,
grpclib.const.Cardinality.UNARY_UNARY,
type(request),
response_type,
**self.__resolve_request_kwargs(timeout, deadline, metadata),
) as stream:
await stream.send_message(request, end=True)
+ await stream.recv_initial_metadata()
+ await stream.recv_trailing_metadata()
response = await stream.recv_message()
assert response is not None
return response
I did test both of the failed requests we had, and both cases are now raising a
grpclib.exceptions.GRPCError
as opossed to failing the assertion. I also
tested a working request, which still works fine.
I am unsure if this is the correct solution. I think this might also be
a bug in the device we are talking to, because the recv_trailing_metadata()
would be called if the initial headers would include a grpc-status
property.
Great job debugging that @abn 😄
Great job debugging that @nvxf 😄
I am unsure if this is the correct solution. I think this might also be a bug in the device we are talking to, because the recv_trailing_metadata() would be called if the initial headers would include a grpc-status property.
Is the error handling now also different/improved for the initial test that you did (with the ValueError in the backend)?
Great job debugging that @abn 😄
Debugging credits go to @nvxf :)
Is the error handling now also different/improved for the initial test that you did (with the ValueError in the backend)?
I think the issue is slightly different; from what @nvxf mentioned, it would seem that using stream.recv_message()
and the server responds with a "trailers-only response" without setting the grpc-status
header the message is not correctly retrieved and our assertion fails. Not sure if this is a grpclib
issue or if we need to handle the scenario on betterproto
side.
The solution proposed above will work since those are idempotent calls. However, this might only be working around the root cause.
Great job debugging that @abn 😄
Debugging credits go to @nvxf :)
oops! my bad! thanks @nvxf for looking into this!
@boukeversteegh so, it seems we were asserting for response data a bit too eagerly without giving the trailing metadata handling a chance to take place. I have proposed a fix in #127.
Fixed by @abn in #127
I used this lib to access a device via gNMI. Unfortunately, all error messages from the devices are ignored and cannot be accessed, since only an assertion is raised: https://github.com/danielgtaylor/python-betterproto/blob/cbd34370803a537743f6cdcadb9e5893781b137d/src/betterproto/grpc/grpclib_client.py#L79
Is there a plan for improving the error handling for error responses from grpc servers?