bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!
https://bentoml.com
Apache License 2.0
6.82k stars 767 forks source link

bug: timeout configuration not work #4789

Open BangDaeng opened 1 month ago

BangDaeng commented 1 month ago

Describe the bug

The timeout setting of api_server and runner is not working in bentoml. i'm using bentoml 1.0.20.post11 version The default configuration is as follows

version: 1
api_server:
  workers: ~ # cpu_count() will be used when null
  timeout: 60
  backlog: 2048
  # the maximum number of connections that will be made to any given runner server at once
  max_runner_connections: 16
  metrics:
    enabled: true
    namespace: bentoml_api_server
    duration:
      # https://github.com/prometheus/client_python/blob/f17a8361ad3ed5bc47f193ac03b00911120a8d81/prometheus_client/metrics.py#L544
      buckets: [0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0]
      min: ~
      max: ~
      factor: ~
  logging:
    access:
      enabled: true
      request_content_length: true
      request_content_type: true
      response_content_length: true
      response_content_type: true
      format:
        trace_id: 032x
        span_id: 016x
  ssl:
    enabled: false
    certfile: ~
    keyfile: ~
    keyfile_password: ~
    ca_certs: ~
    version: 17 # ssl.PROTOCOL_TLS_SERVER
    cert_reqs: 0 # ssl.CERT_NONE
    ciphers: TLSv1 # default ciphers
  http:
    host: 0.0.0.0
    port: 3000
    cors:
      enabled: false
      access_control_allow_origins: ~
      access_control_allow_credentials: ~
      access_control_allow_methods: ~
      access_control_allow_headers: ~
      access_control_allow_origin_regex: ~
      access_control_max_age: ~
      access_control_expose_headers: ~
    response:
      trace_id: false
  grpc:
    host: 0.0.0.0
    port: 3000
    max_concurrent_streams: ~
    maximum_concurrent_rpcs: ~
    max_message_length: -1
    reflection:
      enabled: false
    channelz:
      enabled: false
    metrics:
      host: 0.0.0.0
      port: 3001
  runner_probe: # configure whether the API server's health check endpoints (readyz, livez, healthz) also check the runners
    enabled: true
    timeout: 1
    period: 10
runners:
  resources: ~
  workers_per_resource: 1
  timeout: 300
  batching:
    enabled: true
    max_batch_size: 100
    max_latency_ms: 10000
  logging:
    access:
      enabled: true
      request_content_length: true
      request_content_type: true
      response_content_length: true
      response_content_type: true
  metrics:
    enabled: true
    namespace: bentoml_runner
tracing:
  exporter_type: ~
  sample_rate: ~
  excluded_urls: ~
  timeout: ~
  max_tag_value_length: ~
  zipkin:
    endpoint: ~
    local_node_ipv4: ~
    local_node_ipv6: ~
    local_node_port: ~
  jaeger:
    protocol: thrift
    collector_endpoint: ~
    thrift:
      agent_host_name: ~
      agent_port: ~
      udp_split_oversized_batches: ~
    grpc:
      insecure: ~
  otlp:
    protocol: ~
    endpoint: ~
    compression: ~
    http:
      certificate_file: ~
      headers: ~
    grpc:
      headers: ~
      insecure: ~
monitoring:
  enabled: true
  type: default
  options:
    log_config_file: ~
    log_path: monitoring

I modified the timeout of the api_server and the timeout of the runners, but the settings are not applied When I write service.py as below and enter kentoml serve and test it, it doesn't work normally.

I tried changing the default configuration I wrote configuration.yaml and tried changing it. I also checked that it changed properly by printing the config setting.

but not work, please help me

@svc.api(
    input=JSON.from_sample(samples),
    output=NumpyNdarray(),
    route=BUILD_NAME,
)
async def predict(sentences):
    time.sleep(350)
    output = await scappy_runner.predict.async_run(sentences)
    return output

Please Help Me

To reproduce

No response

Expected behavior

No response

Environment

bentoml: 1.0.20.post11 python: 3.10.12

frostming commented 1 month ago

Can you try on the latest version of BentoML?