Performance in cluster is low than without cluster

akash-harijan commented 4 years ago

The performance of a tensorflow model on aws environment is almost three times slow in comparision with local environment.

Locally there is not any cluster, and all those cluster layers, just a simple model with tensorflow-serving is deployed but on aws environment, whole cluster layers are also attached with tensorflow-serving layer. This can decrease performance but not this much. Such performance decrease is expected on aws environment?

Locally: 30 FPS AWS: 10 FPS

RobertLucian commented 4 years ago

@AkashDharani a couple of factors could be at play here:

Low single-thread/multi-thread performance of the instance type you've selected compared to your local setup.
Maybe you're using the GPU locally and for AWS clusters you only have CPU.
Low network bandwidth (and/or high latency) between your machine and the AWS cluster.

Such performance decrease is expected on aws environment?

No, it isn't. As long as you're having the same compute capacity on both your local machine and on your AWS cluster, there should not be any drop in performance.

Locally: 30 FPS AWS: 10 FPS

When you say locally, I presume you're referring to your local deployment which got deployed with cortex deploy -e local, right?

Could you provide us with more information regarding your AWS/local setup? It would be great if we could see your cortex.yaml API config.

Also, is this ticket related to https://github.com/cortexlabs/cortex/issues/1426#issuecomment-713432792?

akash-harijan commented 4 years ago

@RobertLucian

Well I used same cortex.yaml file for both local environment and aws environment, so threads are same both time.
Both GPU have compute capacity 7.5, and I have made sure that on instance gpu is utilized by keeping an eye on utilization of gpu via nvidia-smi.
As instances are deployed on aws, both server(model) and client(where i am hitting api) in same region, so network should not be any issue.

And #1426 was related to GPU Ram and this is related to Performance.

I am attaching my model cortex.yaml file as well.

`

name: detection
kind: RealtimeAPI predictor: type: tensorflow path: predict-2.py
model_path: s3://cortex-api-sight/detection signature_key: serving_default processes_per_replica: 1 threads_per_process: 4
config: classes: None input_shape: [None, None] input_key: image_arrays:0 output_key: detections:0 compute: cpu: 3 gpu: 1
mem: 8G

networking: endpoint: detection local_port: 9999 api_gateway: public`

akash-harijan commented 4 years ago

I also deployed one of the example given by cortex dev on both local environment and aws environment, license-plate-reader via cortex_lite.yaml.

Local FPS: 10-11 AWS FPS: 5-6

Almost half fps dropped.

RobertLucian commented 4 years ago

@AkashDharani this should not happen unless the GPUs are different. What GPUs do you have on both your local machine and on the AWS instance?

The following is a list of GPUs having a compute capability of 7.5. A T4 will have a significantly different performance compared to an MX450 or an RTX 2080 Ti.

akash-harijan commented 4 years ago

Locally I am using RTX 2080 Ti and on AWS it is Telsa T4.

RobertLucian commented 4 years ago

@AkashDharani right, so that's it. The RTX 2080 Ti is a lot faster than a Tesla T4:

On FP16 TFLOPS, the RTX 2080 Ti is 2.4 times faster than the T4.
On FP32 TFLOPS, the RTX 2080 Ti is ~1.7 times faster than the T4.
On FP64 TFLOPS, the RTX 2080 Ti is ~1.7 times faster than the T4.

Here's where you can see that: https://www.techpowerup.com/gpu-specs/geforce-rtx-2080-ti.c3305 https://www.techpowerup.com/gpu-specs/tesla-t4.c3316

So getting 10-11 predictions/s on the RTX and 5-6 predictions/s on the T4 sounds just about right. If you need more performance, I either suggest you increase your max_replicas field to something bigger (in your cortex.yaml) or go for a more powerful GPU on AWS (like the V100), but keep in mind that the best bang for the buck is the T4. Spots are also desirable.

akash-harijan commented 4 years ago

Deployed same model on g4dn.xlarge, on Tesla T4, and got around 25 FPS.

Locally (RTX 2080 Ti) : 30 FPS, used cortex to deploy a model on local machine without cluster. AWS Cluster(Tesla T4) : 10 FPS AWS EC2 instance (Tesla T4) : 25 FPS, used cortex to deploy a model on ec2 instance without cluster.

akash-harijan commented 4 years ago

Deployed a model license-plate-reader via cortex_lite.yaml in g4dn.xlarge, Telsa T4 and got around 8-9 FPS.

Locally (RTX 2080 Ti): 10-11 FPS, used cortex to deploy a model on local machin without cluster. AWS Cluster (Tesla T4): 5-6 FPS AWS EC2 Instance (Telsa T4) : 8-9 FPS, used cortex to deploy a model on ec2 instance without cluster.

RobertLucian commented 4 years ago

@AkashDharani That difference from 5-6 to 8-9 predictions/s is probably caused by the network delays and the concurrency level you have set on both cases. Since a request takes more time due to the round-trip delays caused by the network, you will have to increase the concurrency level (aka more concurrent requests per sec) when the Cortex cluster is used.

When using the Cortex cluster, you should probably increase the predictor.threads_per_replica in your API config and then make more concurrent requests.

I also assumed that the requests for AWS Cluster (Tesla T4): 5-6 FPS and AWS EC2 Instance (Telsa T4) : 8-9 FPS were done from your local machine.

akash-harijan commented 3 years ago

Thanks Robert for your guidance, btw I am also looking into Nvidia Trition Inference Server, it seems quite promising as well in comparision with Tensorflow Serving Inference. You guys can look into this as well.

deliahu commented 3 years ago

@AkashDharani I'll go ahead and close this issue, feel free to reach out if you have additional questions

cortexlabs / cortex

Performance in cluster is low than without cluster #1479