GoogleCloudPlatform / gke-networking-recipes

Apache License 2.0
307 stars 85 forks source link

response based routing in MCS #113

Closed rajeshdhanda closed 1 year ago

rajeshdhanda commented 1 year ago

Service A : Inference Server Service B : predictor host in gpu-pool with minReplicas: 0, scale up on request from inference server

Service A internally calls Service B, but sometimes Service B is down resulting in 500: Internal Server Error in final response. Need to route the request in different regional cluster based on response from Service A.

Thanks Untitled-2022-10-06-1300

boredabdel commented 1 year ago

Can you please explain what is the ask here ?

rajeshdhanda commented 1 year ago

Hi @boredabdel Service A and Service B are services in different node-pool in a GKE Cluster. Service A is primary service in default-pool and is always up. Service B runs on GPU node ( node in gpu-pool with auto scale 0-3 ). Whenever any request is sent to Service B, it scales up ( gpu node is created and then pod replica ) and process the request. Sometimes due to gpu resource unavailbility in node is not able to scale up. Service A is Kserve InferenceService calling Service B

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: Service_A 
spec:
  predictor:
    containers:
      - name: kfserving-container
        image: gcr_path_here
        args:
          - --predictor_host=host_from_service_B

Request from client is sent to primary Service A. Service A internally calls Service B, but sometimes Service B is not able to scale up ( gpu resource unavailbility ) resulting in 500: Internal Server Error in final response from Service A.

Need to route the request in different regional cluster based on response from Service A.

rajeshdhanda commented 1 year ago

Hi @boredabdel Basically, Multi cluster service is connected to Service A which is always up and MCS keep sending request to Service A. what i am looking for : if service B is not able up to scale up, somehow MCS should be able to detect it based on response ( status code ) from service A in region_1 and forward the request to service A in region_2.

boredabdel commented 1 year ago

MCS doesn't have the capability to detect service availability and re-route traffic accordingly. Besides the purpose of this repo is not to solve specific issues. Rather provide people with guidance on how to expose apps on GKE using the various Networking API's. So i will close this. Please open a new issue if something in the repo is broken.