Triton Backend - Githubissues

dbasbabasi commented 2 years ago

Hi, It works on trt backend. I am trying to run it on the triton backend. I changed docker parameter in the deploy_trt file. It fails on warmup on triton backend. Do I need change another conf?

SthPhoenix commented 2 years ago

Hi! Triton backend should work, but for now it's up to you to run separate Triton Server container and provide it's url to deploy_trt.sh config.

Also, currently there is known issue with inference of SCRFD model with Triton backend - Triton provides outputs as non writable numpy array, but new optimized SCRFD post processing modifies net output arrays in place to avoid excessive creation of numpy arrays. For now it can be fixed by replacing lines 332-334 of scrfd.py with:

score_blob = np.copy(net_outs[idx][0])
bbox_blob =  np.copy(net_outs[idx + self.fmc][0])
kpss_blob =  np.copy(net_outs[idx + self.fmc * 2][0])

dbasbabasi commented 2 years ago

Thank you for your quick response.

Actually I have some experience with Triton but there is a problem with getting the metadata during load the model and docker container is stopping automatically. I tried to debug it. But I couldn't fix it. I used following models and conf.

max_size=640,640 det_model=retinaface_r50_v1 rec_model=arcface_r100_v1

Docker logs:

SthPhoenix commented 2 years ago

Have you changed localhost to actual triton server IP:grpc port? In docker localhost is container itself, not the host machine.

dbasbabasi commented 2 years ago

deploy_trt.txt

Here is my deploy_trt file. Yeah I tried with host ip and also tried with localhost and open the port on docker run comment.

SthPhoenix commented 2 years ago

You shouldn't bind triton ports inside insightface-rest container, it should cause exceptions when starting triton server, or IFR container

dbasbabasi commented 2 years ago

Yeah I got it. I deleted ports, run inference docker after that run the deploy_trt. It looks detection model uploaded and I can see the model output list during the load but I got another error for Arcface. I am checking it. Thank you so much for your help.

SthPhoenix commented 2 years ago

IFR is using shared GPU memory to communicate with triton server, it may not work if triton is on different host.

dbasbabasi commented 2 years ago

Yeah it works on the same machine. I could send a face detection request to Triton. But when I tried to load face rec model, it is returning Cuda shared memory error.

Also I needed to change face detection request dimension for fixing it.

SthPhoenix commented 2 years ago

I have just checked - everything seems to be working using fix from https://github.com/SthPhoenix/InsightFace-REST/issues/60#issuecomment-972911459 I have followed these steps:

Run deploy_trt.sh setting rec_batch_size = 32 and det_batch_size = 10
Wait until trt engines are built
Stop IRF container
Copy engines to Triton server models folder under following paths: {triton_models}/scrfd_10g_gnkps/1/model.plan, {triton_models}/glintr100/1/model.plan
Run Triton server, ensure it actually have started.
Edit deploy_trt.sh changing det_batch_size to 1 and INFERENCE_BACKEND to triton and providing valid triton_uri (your host machine local IP address)
Run deploy_trt.sh again.
Now you IFR container should be using Triton inference server.

Though you should provide valid model configs to get use of dynamic batching.

Also keep in mind that creating shared memory regions actually uses additional GPU memory (about 110-150mb per worker), so ensure you have enough free GPU RAM

dbasbabasi commented 2 years ago

Thank you so much I used onnx model for triton. It works right now for retinaface and arcface. Do you have a plan adding to age gender for triton?

SthPhoenix commented 2 years ago

Gender/age model is now temporarily not supported, since g/a model requires different face crop preprocessing than current glintr100 recognition models.

dbasbabasi commented 2 years ago

I used retinaface resnet model for face detection. I will try to run g/a model. Thank you so much for your help. If you have a recommendation for g/a, I will be really glad, otherwise I will close this issue.

SthPhoenix commented 2 years ago

You could implement it, but you'll have to make copies of face crops numpy arrays at recognition step, otherwise g/a estimations will be totally wrong, due to different preprocessing required for recognition and g/a estimation. Copying numpy arrays will hit overall performance, though I haven't tested how much yet.

dbasbabasi commented 2 years ago

Thank you I used my own model for that as onnx. And write new client for this models. The result looks good. Your repo is awesome. Thank you so much for your help!

SthPhoenix commented 2 years ago

Nice to hear that! Have you used publicly available model for ga or have you trained your own?

dbasbabasi commented 2 years ago

I used my own trained models. I converted them to the onnx and write a new client for age/gender, emotion and mask detection. After the face crop, passed the cropped face to the inference. I see retina face had pretrained mask model but it looks unavailable right now.

SthPhoenix commented 2 years ago

I used my own trained models. I converted them to the onnx and write a new client for age/gender, emotion and mask detection. After the face crop, passed the cropped face to the inference. I see retina face had pretrained mask model but it looks unavailable right now.

Sorry for late reply, finally got some free time )

You have separate models for GA, emotion and mask detection working on 112x112 face crops? That's interesting since all pretrained models for this tasks I have seen were expecting different input shape. Could you point out where I could find training code or models if you have used public repos?

dbasbabasi commented 2 years ago

Hey Yeah GA model is separated. It is not public repo I can't share it. Our models are works with retinaface. I have no idea about public GA and mask models

SthPhoenix / InsightFace-REST

Triton Backend #60