caikit / caikit-nlp

Apache License 2.0
12 stars 49 forks source link

TGIS unary generation validation error not being propagated #385

Closed evaline-ju closed 3 months ago

evaline-ju commented 3 months ago

Describe the bug

Validation errors from TGIS are showing up with The underlying TCP connection is closed when connection errors are not the cause of failure, for example when invalid combinations of arguments are being provided

Sample Code

Calling a caikit-nlp grpc server - text generation capability, with validation error e.g.

grpcurl -plaintext -H "mm-model-id: [model-id]" -d '{
  "stopSequences": [],
  "text": "sad",
  "tokenLogprobs": true
}' localhost:8085 caikit.runtime.Nlp.NlpService.TextGenerationTaskPredict

Expected behavior

Like the streaming text generation case, grpc error with InvalidArgument with message must request input and/or generated tokens to request extra token detail

Observed behavior

grpc error with InvalidArgument with message The underlying TCP connection is closed

Additional context

Pod logs also show some JSON errors

--- Logging error ---
Traceback (most recent call last):
  File "/app/overrides_packages/caikit_nlp/toolkit/text_generation/tgis_utils.py", line 475, in unary_generate
    batch_response = self.tgis_client.Generate(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/caikit/.venv/lib64/python3.11/site-packages/grpc/_channel.py", line 1181, in __call__
    return _end_unary_response_blocking(state, call, False, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/caikit/.venv/lib64/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "must request input and/or generated tokens to request extra token detail"
    debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-08-16T01:04:12.060816269+00:00", grpc_status:3, grpc_message:"must request input and/or generated tokens to request extra token detail"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.11/logging/__init__.py", line 1110, in emit
    msg = self.format(record)
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/logging/__init__.py", line 953, in format
    return fmt.format(record)
           ^^^^^^^^^^^^^^^^^^
  File "/caikit/.venv/lib/python3.11/site-packages/alog/alog.py", line 198, in format
    return json.dumps(log_record, sort_keys=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/usr/lib64/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type method is not JSON serializable
Call stack:
  File "/usr/lib64/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 83, in _worker
    work_item.run()
  File "/usr/lib64/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/caikit/.venv/lib64/python3.11/site-packages/grpc/_server.py", line 793, in _unary_response_in_pool
    response, proceed = _call_behavior(
  File "/caikit/.venv/lib64/python3.11/site-packages/grpc/_server.py", line 610, in _call_behavior
    response_or_iterator = behavior(argument, context)
  File "/app/training_packages/py_grpc_prometheus/prometheus_server_interceptor.py", line 67, in new_behavior
    response_or_iterator = behavior(request_or_iterator, servicer_context)
  File "/app/overrides_packages/caikit/runtime/interceptors/caikit_runtime_server_wrapper.py", line 159, in safe_rpc_call
    return rpc(request, context, caikit_rpc=caikit_rpc)
  File "/app/overrides_packages/caikit/runtime/servicers/global_predict_servicer.py", line 209, in Predict
    response = self.predict_model(
  File "/app/overrides_packages/caikit/runtime/servicers/global_predict_servicer.py", line 330, in predict_model
    response = model_run_fn(**kwargs)
  File "/app/overrides_packages/caikit_nlp/modules/text_generation/text_generation_tgis.py", line 255, in run
    return self.tgis_generation_client.unary_generate(
  File "/app/overrides_packages/caikit_nlp/toolkit/text_generation/tgis_utils.py", line 479, in unary_generate
    log.error("<NLP30829218E>", err.details)
  File "/caikit/.venv/lib/python3.11/site-packages/alog/alog.py", line 450, in <lambda>
    lambda self, arg_one, *args, **kwargs: _log_with_code_method_override(
  File "/caikit/.venv/lib/python3.11/site-packages/alog/alog.py", line 431, in _log_with_code_method_override
    self.log(
Message: {'log_code': '<NLP30829218E>', 'message': <bound method _InactiveRpcError.details of <_InactiveRpcError of RPC that terminated with:
    status = StatusCode.INVALID_ARGUMENT
    details = "must request input and/or generated tokens to request extra token detail"
    debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-08-16T01:04:12.060816269+00:00", grpc_status:3, grpc_message:"must request input and/or generated tokens to request extra token detail"}"
>>, 'args': None}
Arguments: ()