Closed akash-harijan closed 3 years ago
@AkashDharani a couple of factors could be at play here:
Such performance decrease is expected on aws environment?
No, it isn't. As long as you're having the same compute capacity on both your local machine and on your AWS cluster, there should not be any drop in performance.
Locally: 30 FPS AWS: 10 FPS
When you say locally, I presume you're referring to your local deployment which got deployed with cortex deploy -e local
, right?
Could you provide us with more information regarding your AWS/local setup? It would be great if we could see your cortex.yaml
API config.
Also, is this ticket related to https://github.com/cortexlabs/cortex/issues/1426#issuecomment-713432792?
@RobertLucian
And #1426 was related to GPU Ram and this is related to Performance.
I am attaching my model cortex.yaml file as well.
`
name: detection
kind: RealtimeAPI
predictor:
type: tensorflow
path: predict-2.py
model_path: s3://cortex-api-sight/detection
signature_key: serving_default
processes_per_replica: 1
threads_per_process: 4
config:
classes: None
input_shape: [None, None]
input_key: image_arrays:0
output_key: detections:0
compute:
cpu: 3
gpu: 1
mem: 8G
networking: endpoint: detection local_port: 9999 api_gateway: public`
I also deployed one of the example given by cortex dev on both local environment and aws environment, license-plate-reader via cortex_lite.yaml.
Local FPS: 10-11 AWS FPS: 5-6
Almost half fps dropped.
@AkashDharani this should not happen unless the GPUs are different. What GPUs do you have on both your local machine and on the AWS instance?
The following is a list of GPUs having a compute capability of 7.5. A T4 will have a significantly different performance compared to an MX450 or an RTX 2080 Ti.
Locally I am using RTX 2080 Ti and on AWS it is Telsa T4.
@AkashDharani right, so that's it. The RTX 2080 Ti is a lot faster than a Tesla T4:
Here's where you can see that: https://www.techpowerup.com/gpu-specs/geforce-rtx-2080-ti.c3305 https://www.techpowerup.com/gpu-specs/tesla-t4.c3316
So getting 10-11 predictions/s on the RTX and 5-6 predictions/s on the T4 sounds just about right. If you need more performance, I either suggest you increase your max_replicas
field to something bigger (in your cortex.yaml
) or go for a more powerful GPU on AWS (like the V100), but keep in mind that the best bang for the buck is the T4. Spots are also desirable.
Deployed same model on g4dn.xlarge, on Tesla T4, and got around 25 FPS.
Locally (RTX 2080 Ti) : 30 FPS, used cortex to deploy a model on local machine without cluster. AWS Cluster(Tesla T4) : 10 FPS AWS EC2 instance (Tesla T4) : 25 FPS, used cortex to deploy a model on ec2 instance without cluster.
Deployed a model license-plate-reader via cortex_lite.yaml in g4dn.xlarge, Telsa T4 and got around 8-9 FPS.
Locally (RTX 2080 Ti): 10-11 FPS, used cortex to deploy a model on local machin without cluster. AWS Cluster (Tesla T4): 5-6 FPS AWS EC2 Instance (Telsa T4) : 8-9 FPS, used cortex to deploy a model on ec2 instance without cluster.
@AkashDharani That difference from 5-6 to 8-9 predictions/s is probably caused by the network delays and the concurrency level you have set on both cases. Since a request takes more time due to the round-trip delays caused by the network, you will have to increase the concurrency level (aka more concurrent requests per sec) when the Cortex cluster is used.
When using the Cortex cluster, you should probably increase the predictor.threads_per_replica
in your API config and then make more concurrent requests.
I also assumed that the requests for AWS Cluster (Tesla T4): 5-6 FPS
and AWS EC2 Instance (Telsa T4) : 8-9 FPS
were done from your local machine.
Thanks Robert for your guidance, btw I am also looking into Nvidia Trition Inference Server, it seems quite promising as well in comparision with Tensorflow Serving Inference. You guys can look into this as well.
@AkashDharani I'll go ahead and close this issue, feel free to reach out if you have additional questions
The performance of a tensorflow model on aws environment is almost three times slow in comparision with local environment.
Locally there is not any cluster, and all those cluster layers, just a simple model with tensorflow-serving is deployed but on aws environment, whole cluster layers are also attached with tensorflow-serving layer. This can decrease performance but not this much. Such performance decrease is expected on aws environment?
Locally: 30 FPS AWS: 10 FPS