NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.06k stars 615 forks source link

conditional execution error while dali model is loaded by tritonserver #4750

Open DequanZhu opened 1 year ago

DequanZhu commented 1 year ago

I'm building a face keypoints detect system which the model is served by tritonserver. I use dali to do some pre-process steps for face keypoints step. The triton accept origin image and face boundingbox which is the output of face detect model, then crop the face roi image from origin image. Up to this #4735 , I must use conditional execution to crop multiple face from each origin image. My GPU info is below: 微信图片_20230328173011

My tritonserver container version is 21.11 and I reinstall dali by pip install https://developer.download.nvidia.com/compute/redist/nightly/nvidia-dali-nightly-cuda110/nvidia_dali_nightly_cuda110-1.25.0.dev20230327-7720921-py3-none-manylinux2014_x86_64.whl

Below is my code called dali.py.

import nvidia.dali as dali
import nvidia.dali.types as types
from nvidia.dali.pipeline.experimental import pipeline_def

BATCH_SIZE = 8
INPUT_SIZE = 192
FILL_VALUE = 0

def parse_args():
    import argparse
    parser = argparse.ArgumentParser(description="Serialize the pipeline and save it to a file")
    parser.add_argument('file_path', type=str, help='The path where to save the serialized pipeline')
    return parser.parse_args()

def get_warp_matrix(bboxes_size, bboxes_center, bboxes_index):
    bbox_size = bboxes_size[bboxes_index]
    bbox_center = bboxes_center[bboxes_index]
    bbox_start = bbox_center - bbox_size / 2.0
    bbox_end = bbox_center + bbox_size / 2.0
    mt = dali.fn.transforms.crop(from_start=bbox_start, from_end=bbox_end, to_start=[0, 0],
    to_end=[INPUT_SIZE, INPUT_SIZE])
    return mt

def crop_images(mt, images):
    cropped_images = dali.fn.warp_affine(images, size=[INPUT_SIZE, INPUT_SIZE], matrix=mt, fill_value=FILL_VALUE,
    inverse_map=False)
    cropped_images = dali.fn.color_space_conversion(cropped_images, image_type=types.RGB, output_type=types.BGR)
    return cropped_images

@pipeline_def(batch_size=BATCH_SIZE, num_threads=1, device_id=0, enable_conditionals=True)
def simple_pipeline():
    image = dali.fn.external_source(device="cpu", name="DALI_INPUT_0")
    face_bboxes = dali.fn.external_source(device="cpu", name="DALI_INPUT_1")
    bboxes_h = face_bboxes[:, 3] - face_bboxes[:, 1]
    bboxes_h = bboxes_h[:, dali.newaxis]
    bboxes_w = face_bboxes[:, 2] - face_bboxes[:, 0]
    bboxes_w = bboxes_w[:, dali.newaxis]
    bboxes_size = dali.fn.cat(bboxes_h, bboxes_w, axis=1)
    bboxes_size = dali.fn.cast(bboxes_size, dtype=types.FLOAT)
    center_x = (face_bboxes[:, 0] + face_bboxes[:, 2]) / 2
    center_x = center_x[:, dali.newaxis]
    center_y = (face_bboxes[:, 1] + face_bboxes[:, 3]) / 2
    center_y = center_y[:, dali.newaxis]
    center = dali.fn.cat(center_x, center_y, axis=1)
    center = dali.fn.cast(center, dtype=types.FLOAT)
    scale = INPUT_SIZE / (dali.fn.reductions.max(bboxes_size, axes=[1]) * 1.5)
    scale = scale[:, dali.newaxis]
    bboxes_size = dali.fn.cast(bboxes_size, dtype=types.FLOAT) * scale
    face_bounding_num = dali.fn.shapes(face_bboxes)[0]

    # if has one face bounding box
    if face_bounding_num == 1:
        mt = get_warp_matrix(bboxes_size, center, 0)
        cropped_image = crop_images(mt, image)
        out = dali.fn.reshape(cropped_image, src_dims=[-1, 0, 1, 2])

    # if has two face bounding box
    elif face_bounding_num == 2:
        mt_1 = get_warp_matrix(bboxes_size, center, 0)
        cropped_image_1 = crop_images(mt_1, image)
        mt_2 = get_warp_matrix(bboxes_size, center, 1)
        cropped_image_2 = crop_images(mt_2, image)
        out = dali.fn.stack(cropped_image_1, cropped_image_2)

    # if has three face bounding box
    elif face_bounding_num == 3:
        mt_1 = get_warp_matrix(bboxes_size, center, 0)
        cropped_image_1 = crop_images(mt_1, image)
        mt_2 = get_warp_matrix(bboxes_size, center, 1)
        cropped_image_2 = crop_images(mt_2, image)
        mt_3 = get_warp_matrix(bboxes_size, center, 2)
        cropped_image_3 = crop_images(mt_3, image)
        out = dali.fn.stack(cropped_image_1, cropped_image_2, cropped_image_3)

    # if has more than three bounding box
    else:
        mt_1 = get_warp_matrix(bboxes_size, center, 0)
        cropped_image_1 = crop_images(mt_1, image)
        mt_2 = get_warp_matrix(bboxes_size, center, 1)
        cropped_image_2 = crop_images(mt_2, image)
        mt_3 = get_warp_matrix(bboxes_size, center, 2)
        cropped_image_3 = crop_images(mt_3, image)
        mt_4 = get_warp_matrix(bboxes_size, center, 3)
        cropped_image_4 = crop_images(mt_4, image)
        out = dali.fn.stack(cropped_image_1, cropped_image_2, cropped_image_3, cropped_image_4)
    out = dali.fn.transpose(out, perm=[0, 3, 1, 2])
    return out

def main(filename):
    simple_pipeline().serialize(filename=filename)

if __name__ == '__main__':
    args = parse_args()
    main(args.file_path)

I serialized this to format that can by load by triton by pythhon3 dali.py model.dali. This step produce no error,but when I load it by trion. It produce some errors, below is the detailed log 微信图片_20230328171151 the error message: Unknown: DALI Backend error: [/opt/dali/dali/pipeline/operator/op_spec.h:87] Assert on "schema_ != nullptr" failed: No schema found for operator "_conditional__Split" seems like it has something to do with conditional execution, anyone can tell me how can I correct it.

klecki commented 1 year ago

Hi @DequanZhu, as far as I am aware, it is not enough to install DALI within the container, you also need to rebuild the DALI Backend for Triton. The 21.11 image is quite old, and the DALI that it was built with doesn't support conditional execution.

You either need to wait for the newest Triton image release - 23.03 - it will contain DALI 1.23 with the initial support for conditional execution, or you can follow the instructions on how to build new Triton image with latest DALI and compatible DALI Backend: see using fresh DALI release section.

DequanZhu commented 1 year ago

Hi ,@klecki, Thank you for your reply. According to using fresh DALI release,I must rebuild dali_backend, but the CMakeLists.txt use FetchContent_Declare grammar to download some dependents as it declared at 53-73 lines:

include(FetchContent)

FetchContent_Declare(
  repo-common
  GIT_REPOSITORY https://github.com/triton-inference-server/common.git
  GIT_TAG ${TRITON_COMMON_REPO_TAG}
  GIT_SHALLOW ON
)
FetchContent_Declare(
  repo-core
  GIT_REPOSITORY https://github.com/triton-inference-server/core.git
  GIT_TAG ${TRITON_CORE_REPO_TAG}
  GIT_SHALLOW ON
)
FetchContent_Declare(
  repo-backend
  GIT_REPOSITORY https://github.com/triton-inference-server/backend.git
  GIT_TAG ${TRITON_BACKEND_REPO_TAG}
  GIT_SHALLOW ON
)
FetchContent_MakeAvailable(repo-common repo-core repo-backend)

My computer is in offline environment,I'm not familiar with cmake, it seems like I should download the dependents to local, so I download all the dependents:

git clone --branch r21.11 https://github.com/triton-inference-server/dali_backend.git &&
cd dali_backend &&
git clone --branch r21.05 https://github.com/triton-inference-server/common.git &&
git clone --branch r21.05 https://github.com/triton-inference-server/core.git &&
git clone --branch r21.05 https://github.com/triton-inference-server/backend.git &&

then replace with dali_backend/CMakeLists.txt 70-90 lines with:

add_subdirectoory(common)
add_subdirectoory(core)
add_subdirectoory(backend)

then:

mkdir build &&
cd build &&
cmake .. -D TRITON_SKIP_DALI_DOWNLOAD=ON

but some errors occurs, beacuse CMakeLists.txt in backend submodule alse use FetchContent_Declare grammar in 47-61 lines:

include(FetchContent)

FetchContent_Declare(
  repo-common
  GIT_REPOSITORY https://github.com/triton-inference-server/common.git
  GIT_TAG ${TRITON_COMMON_REPO_TAG}
  GIT_SHALLOW ON
)
FetchContent_Declare(
  repo-core
  GIT_REPOSITORY https://github.com/triton-inference-server/core.git
  GIT_TAG ${TRITON_CORE_REPO_TAG}
  GIT_SHALLOW ON
)
FetchContent_MakeAvailable(repo-common repo-core)

It also depends core and common submodule, so I just copy core and common directory in parent directory to backend directory,and replace dali_backend/backend/CMakeLists.txt 47-61 lines with:

add_subdirectoory(common)
add_subdirectoory(core)

some errors occurs, below is detailed log: 微信图片_20230329192644 I have very little knowledge about cmake,can you tell me how can modify CMakeLists.txt to build dali_backend in offline enviroment.

klecki commented 1 year ago

@DequanZhu Tirton 23.03 was released today, maybe you can try it, as it has DALI 1.23 with the initial support for conditionals, instead rebuilding the backend on your own? https://github.com/triton-inference-server/server/releases/tag/v2.32.0

DequanZhu commented 1 year ago

@klecki Hi, I successfully run the Tirton 23.03 container, but found out 23.03 tritonserver depends on cuda12.0. My host machine cuda version is cuda 11.0, and I have no permission to update my cuda version in host machine. Is there any way I can run the container in cuda 11.0 machine or is possible to update cuda version just inside container.

JanuszL commented 1 year ago

Hi @DequanZhu,

You can try to update the DALI version inside the existing TRITON container (it should keep the API compatibility between DALI and TRITON DALI backend compatibility). Regarding changing the cuda version inside the container I don't think it is possible, the only thing you can do is to use forward CUDA compatibility but it is limited to data center grade GPUs what may not be applicable for you.

DequanZhu commented 1 year ago

Hi @JanuszL, thanks for reply. I noticed conditional execution is introduced in dali version 1.23, and it depends on cuda11. Is there any tritonserver container version(depends on cuda11) I can execute conditional execution in by just install dali using pip like pip install https://developer.download.nvidia.com/compute/redist/nightly/nvidia-dali-nightly-cuda110/nvidia_dali_nightly_cuda110-1.25.0.dev20230327-7720921-py3-none-manylinux2014_x86_64.whl instead of buildding dali-backend from source because my computer is on offline environment and build from source needs I pulling some resources from network.

JanuszL commented 1 year ago

~Please use the last TRITON server image that uses CUDA 11 (22.12), and inside it /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python -m pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110 and cp -r ./backends/dali/conda/envs/dalienv/lib/python3.8/site-packages/nvidia/dali ./backends/dali It should update the DALI version. If the machine is offline you can download the DALI wheel manually from https://github.com/NVIDIA/DALI/releases/tag/v1.24.0, and install it using: /opt/tritonserver/backends/dali/conda/envs/dalienv/bin/python -m pip install THE_WHEEL_FILE_LOCATION. Keep in mind that certain DALI TRITON BACKEND functionalities depend on TRITON and updating DALI itself may not enable them.~ --- update --- Update. It seems that I misunderstood how DALI is deployed inside the TRITON image. To update DALI inside you need also to rebuild the backend. The best would be to follow - https://github.com/triton-inference-server/dali_backend/tree/main/docker.

DequanZhu commented 1 year ago

Hi @JanuszL, I want to build dali-backend from source referring to Bare metal.I want to know which tritonserver container version and dali-backend version and dali python whl package version I should use to build from. I use tritonserver 21.11 container, dali-backend r21.11 and nvidia-dali-nightly-cuda110 1.25.0.dev20230323 but when run make some errors occured:

root@009d4c0685e5:/workspace/dali_backend/build# cmake ..  -D TRITON_SKIP_DALI_DOWNLOAD=ON
-- Build configuration: Release
-- RapidJSON found. Headers: /usr/local/lib/cmake/RapidJSON/../../../include
-- RapidJSON found. Headers: 
-- Using CUDA 11.5
-- DALI includes dir: /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include
-- DALI libs dir: /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali
-- DALI libs: dalidali_coredali_kernelsdali_operators
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/dali_backend/build
root@009d4c0685e5:/workspace/dali_backend/build# make -j32
-- Build configuration: Release
-- RapidJSON found. Headers: /usr/local/lib/cmake/RapidJSON/../../../include
-- RapidJSON found. Headers: 
-- Using CUDA 11.5
-- DALI includes dir: /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include
-- DALI libs dir: /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali
-- DALI libs: dalidali_coredali_kernelsdali_operators
-- Configuring done
-- Generating done
-- Build files have been written to: /workspace/dali_backend/build
[  3%] Acquiring DALI release
[  9%] Built target triton-common-error
[ 12%] Linking CXX static library libkernel-library-new.a
[ 18%] Built target triton-common-table-printer
[ 27%] Built target triton-common-async-work-queue
[ 33%] Built target triton-core-serverstub
[ 36%] Built target kernel-library-new
[ 57%] Built target triton-backend-utils
Looking in indexes: http://central.jaf.xxxxxx.cn/artifactory/api/pypi/group-pypi/simple
Collecting nvidia-dali-nightly-cuda110==1.25.0.dev20230323
  Downloading http://xxxxxx.kxecs.itc.xxxxxx.cn:9020/la08_32_general.10/package/nvidia/nvidia_dali_nightly_cuda110-1.25.0.dev20230323-7682147-py3-none-manylinux2014_x86_64.whl (485.8 MB)
     |████████████████████████████████| 485.8 MB 4.0 MB/s             
Collecting gast<=0.4.0,>=0.2.1
  Using cached http://central.jaf.xxxxxx.cn/artifactory/api/pypi/group-pypi/packages/packages/b6/48/583c032b79ae5b3daa02225a675aeb673e58d2cb698e78510feceb11958c/gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting astunparse>=1.6.0
  Using cached http://central.jaf.xxxxxx.cn/artifactory/api/pypi/group-pypi/packages/packages/2b/03/13dde6512ad7b4557eb792fbcf0c653af6076b81e5941d36ec61f7ce6028/astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting wheel<1.0,>=0.23.0
  Downloading http://central.jaf.xxxxxx.cn/artifactory/api/pypi/group-pypi/packages/packages/61/86/cc8d1ff2ca31a312a25a708c891cf9facbad4eae493b3872638db6785eb5/wheel-0.40.0-py3-none-any.whl (64 kB)
     |████████████████████████████████| 64 kB 1.9 MB/s             
Collecting six<2.0,>=1.6.1
  Downloading http://central.jaf.xxxxxx.cn/artifactory/api/pypi/group-pypi/packages/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl (11 kB)
Saved ./nvidia_dali_nightly_cuda110-1.25.0.dev20230323-7682147-py3-none-manylinux2014_x86_64.whl
Saved ./astunparse-1.6.3-py2.py3-none-any.whl
Saved ./gast-0.4.0-py3-none-any.whl
Saved ./six-1.16.0-py2.py3-none-any.whl
Saved ./wheel-0.40.0-py3-none-any.whl
Successfully downloaded nvidia-dali-nightly-cuda110 astunparse gast six wheel
WARNING: You are using pip version 21.3.1; however, version 23.0.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
Scanning dependencies of target dali_executor
[ 60%] Building CXX object src/dali_executor/CMakeFiles/dali_executor.dir/dali_executor.cc.o
[ 63%] Building CXX object src/dali_executor/CMakeFiles/dali_executor.dir/io_buffer.cc.o
[ 66%] Building CXX object src/dali_executor/CMakeFiles/dali_executor.dir/dali_pipeline.cc.o
In file included from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/memory_resource.h:21,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/default_resources.h:21,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/memory.h:20,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/dev_buffer.h:24,
                 from /workspace/dali_backend/src/dali_executor/utils/dali.h:28,
                 from /workspace/dali_backend/src/dali_executor/io_descriptor.h:26,
                 from /workspace/dali_backend/src/dali_executor/io_buffer.h:26,
                 from /workspace/dali_backend/src/dali_executor/io_buffer.cc:23:
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource: In member function 'bool cuda::__4::basic_resource_view<_ResourcePointer, _Properties>::operator==(const cuda::__4::basic_resource_view<_OtherPointer, _OtherProperties ...>&) const':
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource:818:13: error: typedef 'using __view1_t = class cuda::__4::basic_resource_view<_ResourcePointer, _Properties>' locally defined but not used [-Werror=unused-local-typedefs]
  818 |       using __view1_t = basic_resource_view;
      |             ^~~~~~~~~
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource:819:13: error: typedef 'using __view2_t = class cuda::__4::basic_resource_view<_OtherPointer, _OtherProperties ...>' locally defined but not used [-Werror=unused-local-typedefs]
  819 |       using __view2_t = basic_resource_view<_Ptr2, _Props2...>;
      |             ^~~~~~~~~
In file included from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/memory_resource.h:21,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/default_resources.h:21,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/memory.h:20,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/dev_buffer.h:24,
                 from /workspace/dali_backend/src/dali_executor/utils/dali.h:28,
                 from /workspace/dali_backend/src/dali_executor/io_descriptor.h:26,
                 from /workspace/dali_backend/src/dali_executor/dali_pipeline.h:30,
                 from /workspace/dali_backend/src/dali_executor/dali_pipeline.cc:23:
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource: In member function 'bool cuda::__4::basic_resource_view<_ResourcePointer, _Properties>::operator==(const cuda::__4::basic_resource_view<_OtherPointer, _OtherProperties ...>&) const':
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource:818:13: error: typedef 'using __view1_t = class cuda::__4::basic_resource_view<_ResourcePointer, _Properties>' locally defined but not used [-Werror=unused-local-typedefs]
  818 |       using __view1_t = basic_resource_view;
      |             ^~~~~~~~~
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource:819:13: error: typedef 'using __view2_t = class cuda::__4::basic_resource_view<_OtherPointer, _OtherProperties ...>' locally defined but not used [-Werror=unused-local-typedefs]
  819 |       using __view2_t = basic_resource_view<_Ptr2, _Props2...>;
      |             ^~~~~~~~~
In file included from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/memory_resource.h:21,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/default_resources.h:21,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/mm/memory.h:20,
                 from /workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/core/dev_buffer.h:24,
                 from /workspace/dali_backend/src/dali_executor/utils/dali.h:28,
                 from /workspace/dali_backend/src/dali_executor/io_descriptor.h:26,
                 from /workspace/dali_backend/src/dali_executor/dali_pipeline.h:30,
                 from /workspace/dali_backend/src/dali_executor/dali_executor.h:30,
                 from /workspace/dali_backend/src/dali_executor/dali_executor.cc:23:
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource: In member function 'bool cuda::__4::basic_resource_view<_ResourcePointer, _Properties>::operator==(const cuda::__4::basic_resource_view<_OtherPointer, _OtherProperties ...>&) const':
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource:818:13: error: typedef 'using __view1_t = class cuda::__4::basic_resource_view<_ResourcePointer, _Properties>' locally defined but not used [-Werror=unused-local-typedefs]
  818 |       using __view1_t = basic_resource_view;
      |             ^~~~~~~~~
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/cuda/memory_resource:819:13: error: typedef 'using __view2_t = class cuda::__4::basic_resource_view<_OtherPointer, _OtherProperties ...>' locally defined but not used [-Werror=unused-local-typedefs]
  819 |       using __view2_t = basic_resource_view<_Ptr2, _Props2...>;
      |             ^~~~~~~~~
In file included from /workspace/dali_backend/src/dali_executor/dali_pipeline.cc:23:
/workspace/dali_backend/src/dali_executor/dali_pipeline.h: In member function 'void triton::backend::dali::DaliPipeline::ReleasePipeline()':
/workspace/dali_backend/src/dali_executor/dali_pipeline.h:187:17: error: request for member 'pipe' in '((triton::backend::dali::DaliPipeline*)this)->triton::backend::dali::DaliPipeline::handle_', which is of pointer type 'daliPipelineHandle' {aka 'DALIPipeline*'} (maybe you meant to use '->' ?)
  187 |     if (handle_.pipe && handle_.ws) {
      |                 ^~~~
/workspace/dali_backend/src/dali_executor/dali_pipeline.h:187:33: error: request for member 'ws' in '((triton::backend::dali::DaliPipeline*)this)->triton::backend::dali::DaliPipeline::handle_', which is of pointer type 'daliPipelineHandle' {aka 'DALIPipeline*'} (maybe you meant to use '->' ?)
  187 |     if (handle_.pipe && handle_.ws) {
      |                                 ^~
In file included from /workspace/dali_backend/src/dali_executor/dali_executor.h:30,
                 from /workspace/dali_backend/src/dali_executor/dali_executor.cc:23:
/workspace/dali_backend/src/dali_executor/dali_pipeline.h: In member function 'void triton::backend::dali::DaliPipeline::ReleasePipeline()':
/workspace/dali_backend/src/dali_executor/dali_pipeline.h:187:17: error: request for member 'pipe' in '((triton::backend::dali::DaliPipeline*)this)->triton::backend::dali::DaliPipeline::handle_', which is of pointer type 'daliPipelineHandle' {aka 'DALIPipeline*'} (maybe you meant to use '->' ?)
  187 |     if (handle_.pipe && handle_.ws) {
      |                 ^~~~
/workspace/dali_backend/src/dali_executor/dali_pipeline.h:187:33: error: request for member 'ws' in '((triton::backend::dali::DaliPipeline*)this)->triton::backend::dali::DaliPipeline::handle_', which is of pointer type 'daliPipelineHandle' {aka 'DALIPipeline*'} (maybe you meant to use '->' ?)
  187 |     if (handle_.pipe && handle_.ws) {
      |                                 ^~
In file included from /workspace/dali_backend/src/dali_executor/dali_executor.cc:23:
/workspace/dali_backend/src/dali_executor/dali_executor.h: In constructor 'triton::backend::dali::DaliExecutor::DaliExecutor(triton::backend::dali::DaliPipeline)':
/workspace/dali_backend/src/dali_executor/dali_executor.h:47:96: error: no matching function for call to 'dali::ThreadPool::ThreadPool(int, int, bool)'
   47 |       pipeline_(std::move(pipeline)), thread_pool_(GetNumThreads(), pipeline_.DeviceId(), false) {}
      |                                                                                                ^
In file included from /workspace/dali_backend/src/dali_executor/utils/dali.h:37,
                 from /workspace/dali_backend/src/dali_executor/io_descriptor.h:26,
                 from /workspace/dali_backend/src/dali_executor/dali_pipeline.h:30,
                 from /workspace/dali_backend/src/dali_executor/dali_executor.h:30,
                 from /workspace/dali_backend/src/dali_executor/dali_executor.cc:23:
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/pipeline/util/thread_pool.h:39:14: note: candidate: 'dali::ThreadPool::ThreadPool(int, int, bool, const string&)'
   39 |   DLL_PUBLIC ThreadPool(int num_thread, int device_id, bool set_affinity, const std::string& name)
      |              ^~~~~~~~~~
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/pipeline/util/thread_pool.h:39:14: note:   candidate expects 4 arguments, 3 provided
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/pipeline/util/thread_pool.h:37:14: note: candidate: 'dali::ThreadPool::ThreadPool(int, int, bool, const char*)'
   37 |   DLL_PUBLIC ThreadPool(int num_thread, int device_id, bool set_affinity, const char* name);
      |              ^~~~~~~~~~
/workspace/dali_backend/build/src/dali_executor/dali/nvidia/dali/include/dali/pipeline/util/thread_pool.h:37:14: note:   candidate expects 4 arguments, 3 provided
cc1plus: all warnings being treated as errors
make[2]: *** [src/dali_executor/CMakeFiles/dali_executor.dir/build.make:101: src/dali_executor/CMakeFiles/dali_executor.dir/dali_pipeline.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
cc1plus: all warnings being treated as errors
make[2]: *** [src/dali_executor/CMakeFiles/dali_executor.dir/build.make:88: src/dali_executor/CMakeFiles/dali_executor.dir/dali_executor.cc.o] Error 1
cc1plus: all warnings being treated as errors
make[2]: *** [src/dali_executor/CMakeFiles/dali_executor.dir/build.make:114: src/dali_executor/CMakeFiles/dali_executor.dir/io_buffer.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:446: src/dali_executor/CMakeFiles/dali_executor.dir/all] Error 2
make: *** [Makefile:149: all] Error 2

Can you share the enviroment based on cuda11 that have been tested can successfully build dali so that I can use condition execution.

JanuszL commented 1 year ago

Hi @DequanZhu,

Can you try applying https://github.com/triton-inference-server/dali_backend/pull/180 and check again (there was a small change in the DALI C/C++ API recently).