MIGraphX execution provider for Triton Inference Server

bpickrel commented 1 year ago

Can this be done by leveraging the onnxruntime work we already have as a back end?

As a preliminary step, learn to add a Cuda back end, then change it to MIGraphX/ROCm

See https://github.com/triton-inference-server/onnxruntime_backend and https://github.com/triton-inference-server/onnxruntime_backend#onnx-runtime-with-tensorrt-optimization

Documentation for building the back end is at server docs Development Build of Backend or Repository Agent

gyulaz-htec commented 6 months ago

@TedThemistokleous @bpickrel I wen't through your guide, but when I try to run tritonserver I get the following error: "tritonserver": executable file not found in $PATH: unknown. It seems like the binary is not in the onnxruntime backend directory.

Also if I start the same container without the tritonserver command I get the following message: bash: /opt/conda/envs/py_3.10/lib/libtinfo.so.6: no version information available (required by bash)

Did any of you encounter these issues?

bpickrel commented 6 months ago

The conda no version information available message you saw has to do with an environment variable I set in the Triton command line: -e LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/envs/py_3.10/lib:/opt/tritonserver/backends/onnxruntime which was itself a workaround but I don't exactly recall the issue. If you change the ...py_3.10... portion of that to ...py_3.06... the message goes away. I also don't know if it is an actual problem or just an informational message.

gyulaz-htec commented 6 months ago

@bpickrel what you mentioned fixes the message, but looks like it's not the main issue. The binary is still missing form the docker.

attila-dusnoki-htec commented 6 months ago

https://github.com/TedThemistokleous/server/blob/add_migraphx_rocm_hooks_v2.39.0/src/CMakeLists.txt#L53 This needs to be changed to GIT_REPOSITORY https://github.com/TedThemistokleous/backend.git My guess when it was first built, this org separation was not present. And now it points to a non-existing branch during build. I saw some issues regarding missing dependencies (e.g. libssh2) that needs to be resolved when running the server. But at least the build issue for the "core" bin seems to be resolved.

gyulaz-htec commented 6 months ago

The libssh2 issue is solved when using ...py_3.10..., so we can keep that. The no version information available message doesn't interfer with the tritonserver start.

gyulaz-htec commented 6 months ago

After forcing the GPU kind in onnxruntime_ backend, disabling autoconfig skip (not sure if this needed, from the code it looked like this skip blocks the loading of migraphx provider) and extending the 'config.pbtxt' file for densenet with this:

optimization {
    execution_accelerators {
        gpu_execution_accelerator : [ {
            name : "migraphx"
            parameters { key: "precision_mode" value: "FP32" }
        }]
    }
}

I've got the following error:

E0522 13:36:49.042809 1879 model_lifecycle.cc:621] failed to load 'densenet_onnx' version 1: Internal: onnx runtime error 6: /workspace/onnxruntime/onnxruntime/core/session/provider_bridg
e_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_migraphx.so with error: libmigraph
x_c.so.3: cannot open shared object file: No such file or directory

bpickrel commented 6 months ago

Don't know; I haven't tried that. Ted is out for a while.

gyulaz-htec commented 6 months ago

We managed to get pass through the missing lib issues. I had to update the onnxruntime_backend to copy migraphx .so files over the tritonserver container: https://github.com/gyulaz-htec/onnxruntime_backend/commit/c108443dfb5c73d0b2f49b36aec2d21f191911d5#diff-ad5e480d1a7be6ef5d700f428d3f7da2559e36ee630e6a848a959dcdc3753832R424 I tried to force migraphx provider with GPU in the code but it still fails back to CPU. Unfortunatelly tritonserver automatically extends the config file and sets the instance_group kind to CPU_KIND:

    "instance_group": [
        {
            "name": "densenet_onnx",
            "kind": "KIND_CPU",
            "count": 2,
            "gpus": [],
            ...
        }
    ],

We should see KIND_GPU there, but the core API is not allowing it. We suspect this part of the core API must be updated to fix this issue: https://github.com/triton-inference-server/core/blob/bbcd7816997046821f9d1a22e418acb84ca5364b/src/model_config_utils.cc#L1626-L1630

bpickrel commented 6 months ago

You would probably have to hipify the Cuda core codebase, or at least such routines as GetSupportedGPUs(), and set the appropriate build repo tag to use the changed version when building. I tried running hipify-perl on the entire code base but did not follow up to investigate why it didn't instantly make the GPU work. I can report that running hipify-perl on the entire code base is EXTREMELY slow (~12 hours) and you should probably either run it overnight or take the time to write a script to hipify multiple files in separate threads or processes.

gyulaz-htec commented 6 months ago

We've managed to run the densenet_onnx with MIGraphX provider on the GPU: The code is available on my fork: https://github.com/gyulaz-htec/server/tree/add_migraphx_rocm_hooks_v2.39.0 Steps to run tritonserver:

git clone git@github.com:gyulaz-htec/server.git
cd server
git switch add_migraphx_rocm_hooks_v2.39.0
# fetch densenet_onnx model
./docs/examples/fetch_models.sh
# building tritonserver dcoker image
python3 build.py  --no-container-pull --enable-logging --enable-stats --enable-tracing   --enable-rocm --enable-metrics --verbose  --endpoint=grpc --image='gpu-base,rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2'  --ort_organization=https://github.com/gyulaz-htec --ort_branch=add_migraphx_rocm_onnxrt_eps --endpoint=http --backend=onnxruntime:main --library-paths=../onnxruntime_backend/
# starting tritonserver inside docker image
docker run --name gyulas_container --device=/dev/kfd --device=/dev/dri -it -e LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/envs/py_3.10/lib:/opt/tritonserver/backends/onnxruntime:/opt/rocm-6.0.2/lib --rm --net=host -v /home/htec/gyulaz/triton/server/docs/examples/model_repository/:/models tritonserver tritonserver --model-repository=/models/ --exit-on-error=false

For testing I've used the imagenet2012 500 dataset from the nvcr.io/nvidia/pytorch:24.02-py3 image:

# start docker
docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:24.04-py3-sdk /bin/bash
# download imagenet dataset
mkdir images
cd images
wget  https://www.dropbox.com/s/57s11df6pts3z69/ILSVRC2012_img_val_500.tar
tar -xvf ./ILSVRC2012_img_val_500.tar
# remove .tar because the imagenet_client can't parse it
rm ILSVRC2012_img_val_500.tar
# run imagenet client
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/

TedThemistokleous commented 5 months ago

We also already test with the ILSVRC2012 dataset for resnet50 as well as part of our examples with Onnxruntime inference examples repo found here:

https://github.com/microsoft/onnxruntime-inference-examples/blob/main/quantization/image_classification/migraphx/resnet50/e2e_migraphx_resnet_example.py

You can reuse the dataset to get numbers for this so we can compare to existing runs using this larger dataset

TedThemistokleous commented 5 months ago

For Bert I've got a fork with an example that can be used for fp16/int8 runs for Bert. Avoid mixed precision for this

https://github.com/TedThemistokleous/onnxruntime-inference-examples/tree/add_migx_bert_squad_quant_example/quantization/nlp/bert/migraphx

This will let us compare Bert quickly and I believe can also evaluate performance output as well if you don't use the no_eval flag.

gyulaz-htec commented 5 months ago

Run resnet50 example with triton-server

The code is available on my fork: https://github.com/gyulaz-htec/server/tree/add_migraphx_rocm_hooks_v2.39.0

Start triton-server with ResNet50

git clone git@github.com:gyulaz-htec/server.git
cd server
git switch add_migraphx_rocm_hooks_v2.39.0

# fetch models and setup folder structure
./docs/examples/fetch_models.sh

# ResNet50 ONNX model is only available from ONNX zoo. You have to download it manally with git-lfs
# from https://github.com/onnx/models/blob/main/validated/vision/classification/resnet/model/resnet50-v2-7.onnx
# and copy it under docs/examples/model_repository/resnet50_onnx/1
git clone https://github.com/onnx/models
cd models
git lfs pull --include="/validated/vision/classification/resnet/model/resnet50-v2-7.onnx" --exclude=""
cp validated/vision/classification/resnet/model/resnet50-v2-7.onnx /path/to/triton/server/docs/examples/model_repository/resnet50_onnx/1

# building tritonserver docker image from triton-server folder
python3 build.py  --no-container-pull --enable-logging --enable-stats --enable-tracing   --enable-rocm --enable-metrics --verbose  --endpoint=grpc --image='gpu-base,rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2'  --ort_organization=https://github.com/gyulaz-htec --ort_branch=add_migraphx_rocm_onnxrt_eps --endpoint=http --backend=onnxruntime:main --library-paths=../onnxruntime_backend/

# starting tritonserver inside docker image
docker run --name gyulas_container --device=/dev/kfd --device=/dev/dri -it -e LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/envs/py_3.10/lib:/opt/tritonserver/backends/onnxruntime:/opt/rocm-6.0.2/lib --rm --net=host -v /home/htec/gyulaz/triton/server/docs/examples/model_repository/:/models tritonserver tritonserver --model-repository=/models/ --exit-on-error=false

Run ResNet50 triton client

Client code is available on my fork: https://github.com/gyulaz-htec/client/blob/migraphx_resnet50/src/python/examples/resnet50_image_client.py

# Download and extract  ILSVRC2012 validation dataset
mkdir ILSVRC2012 && cd ILSVRC2012
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
tar -xvf ILSVRC2012_img_val.tar -C ./cal

# Download 'synset_words.txt'
wget https://raw.githubusercontent.com/HoldenCaulfieldRye/caffe/master/data/ilsvrc12/synset_words.txt

# Get development kit files  'ILSVRC2012_validation_ground_truth.txt' and 'meta.mat'.
mkdir devkit && cd devkit
wget https://raw.githubusercontent.com/miraclewkf/MobileNetV2-PyTorch/master/ImageNet/ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt
wget https://github.com/miraclewkf/MobileNetV2-PyTorch/raw/master/ImageNet/ILSVRC2012_devkit_t12/data/meta.mat

# start docker image
docker run -it --rm --net=host -v /path/to/ILSVRC2012/:/workspace/ILSVRC2012 nvcr.io/nvidia/tritonserver:23.09-py3-sdk /bin/bash

# get resnet50 client code and move it to the proper path
wget https://raw.githubusercontent.com/gyulaz-htec/client/migraphx_resnet50/src/python/examples/resnet50_image_client.py
mv resnet50_image_client.py client/src/python/examples

# start resnet50 client with grpc and asnyc mode
python3 client/src/python/examples/resnet50_image_client.py -m resnet50_onnx -c 1 ./ILSVRC2012 -b 20 --async -c 5 -u localhost:8001 -i grpc

gyulaz-htec commented 5 months ago

Current triton-server results with ResNet50 compared to end to end ORT (MIGraphX) example

The comparison was done using ImageNet2012 50k image dataset Precision: FP32

Note that the server and client are running on the same machine, so response delay will be larger in real life application

	ORT (MGX provider)	Triton server (ORT with MGX provider)
mode	sync	http(sync)	http(async)	grpc(sync)	grpc(stream)	grpc(async)
Inference duration (50k images)	42.52	156.38	65.89	159.14	48.33	44.00
average delay (ms)	17.008	62.552	26.354	63.65	19.332	17.60

bpickrel commented 4 months ago

Where is the parameter execution_accelerators used?

ROCm / AMDMIGraphX