Closed rajeshdhanda closed 1 year ago
Can you please explain what is the ask here ?
Hi @boredabdel
Service A and Service B are services in different node-pool in a GKE Cluster.
Service A is primary service in default-pool and is always up.
Service B runs on GPU node ( node in gpu-pool with auto scale 0-3 ). Whenever any request is sent to Service B, it scales up ( gpu node is created and then pod replica ) and process the request.
Sometimes due to gpu resource unavailbility in node is not able to scale up
.
Service A is Kserve InferenceService calling Service B
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: Service_A
spec:
predictor:
containers:
- name: kfserving-container
image: gcr_path_here
args:
- --predictor_host=host_from_service_B
Request from client is sent to primary Service A.
Service A internally calls Service B, but sometimes Service B is not able to scale up ( gpu resource unavailbility ) resulting in 500: Internal Server Error
in final response from Service A.
Need to route the request in different regional cluster based on response from Service A
.
Hi @boredabdel
Basically, Multi cluster service is connected to Service A which is always up and MCS keep sending request to Service A.
what i am looking for :
if service B is not able up to scale up, somehow MCS should be able to detect it based on response ( status code ) from service A in region_1
and forward the request to service A in region_2
.
MCS doesn't have the capability to detect service availability and re-route traffic accordingly. Besides the purpose of this repo is not to solve specific issues. Rather provide people with guidance on how to expose apps on GKE using the various Networking API's. So i will close this. Please open a new issue if something in the repo is broken.
Service A : Inference Server Service B : predictor host in gpu-pool with
minReplicas: 0
, scale up on request from inference serverService A internally calls Service B, but sometimes Service B is down resulting in
500: Internal Server Error
in final response. Need to route the request in different regional cluster based on response from Service A.Thanks