ROCm / rocAL

The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user.
https://rocm.docs.amd.com/projects/rocAL/en/develop/
MIT License
11 stars 14 forks source link

[Issue]: Tests - Python API: Tests failures & Optimization #220

Open kiritigowda opened 1 week ago

kiritigowda commented 1 week ago

Problem Description

Operating System

ALL

CPU

ALL

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.2.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

kiritigowda commented 1 week ago

@SundarRajan98 @swetha097 please look at this issue

LakshmiKumar23 commented 1 week ago

@SundarRajan28 @swetha097 fails because tries to get dataset from source https://github.com/ROCm/rocAL/blob/cc619f581abb2b40d5b18cdae2550b79ba757e00/tests/python_api/decoder.py#L11C1-L11C11 Please use from /opt/rocm/share/rocal/test/data/images/AMD-tinyDataSet/

SundarRajan28 commented 1 week ago

@kiritigowda @LakshmiKumar23 Changed image_dir path to /opt/rocm/share/rocal/test/data/images/AMD-tinyDataSet/. Example runs to completion

root@x1000c4s5b1n0:/data/numpy_dev/rocAL/tests# python python_api/decoder.py
Optional arguments: <cpu/gpu image_folder>
OK: loaded 125 kernels from libvx_rpp.so
Pipeline has been created succesfully
info: Import CuPy failed. Falling back to CPU!
/data/numpy_dev/rocAL/tests/python_api/decoder.py:33: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
  plt.show()
kiritigowda commented 6 days ago

@SundarRajan28 - with mainline build 14935+ you will be able to test rocAL PyBind with the package install. Below is the steps to test with docker

apt update
apt-get install rocal rocal-dev rocal-test
git clone -b 3.0.2 https://github.com/libjpeg-turbo/libjpeg-turbo.git
mkdir tj-build && cd tj-build
cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=RELEASE -DENABLE_STATIC=FALSE -DCMAKE_INSTALL_DEFAULT_LIBDIR=lib -DWITH_JPEG8=TRUE ../libjpeg-turbo/
make -j8
sudo make install
mkdir rocal-pybind-test && cd rocal-pybind-test
cmake /opt/rocm/share/rocal/test/pybind
ctest -VV

Output

ctest -VV
UpdateCTestConfiguration  from :/root/rocal-pybind-test/DartConfiguration.tcl
Parse Config file:/root/rocal-pybind-test/DartConfiguration.tcl
UpdateCTestConfiguration  from :/root/rocal-pybind-test/DartConfiguration.tcl
Parse Config file:/root/rocal-pybind-test/DartConfiguration.tcl
Test project /root/rocal-pybind-test
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
    Start 1: rocal_pybind_test_decoders

1: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/decoders_test.py"
1: Working Directory: /root/rocal-pybind-test
1: Environment variables: 
1:  PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
1: Test timeout computed to be: 1500
1: rocAL PyBind Decoders
1: ('audio', <function audio at 0x79394c752d40>)
1: ('image', <function image at 0x7939a80b7a30>)
1: ('image_random_crop', <function image_random_crop at 0x7939a80b7b50>)
1: ('image_raw', <function image_raw at 0x7939a80b7be0>)
1: ('image_slice', <function image_slice at 0x79394c752cb0>)
1/6 Test #1: rocal_pybind_test_decoders .......   Passed    6.90 sec
test 2
    Start 2: rocal_pybind_test_functions

2: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/functions_test.py"
2: Working Directory: /root/rocal-pybind-test
2: Environment variables: 
2:  PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
2: Test timeout computed to be: 1500
2: rocAL PyBind Functions
2: ('blend', <function blend at 0x75d2895f3b50>)
2: ('blur', <function blur at 0x75d24db63f40>)
2: ('box_encoder', <function box_encoder at 0x75d24db68dc0>)
2: ('box_iou_matcher', <function box_iou_matcher at 0x75d24db69090>)
2: ('brightness', <function brightness at 0x75d24db63d90>)
2: ('brightness_fixed', <function brightness_fixed at 0x75d24db63e20>)
2: ('center_crop', <function center_crop at 0x75d24db68a60>)
2: ('color_temp', <function color_temp at 0x75d24db68e50>)
2: ('color_twist', <function color_twist at 0x75d24db68b80>)
2: ('contrast', <function contrast at 0x75d24db68040>)
2: ('copy', <function copy at 0x75d24db68f70>)
2: ('crop', <function crop at 0x75d24db68af0>)
2: ('crop_mirror_normalize', <function crop_mirror_normalize at 0x75d24db689d0>)
2: ('exposure', <function exposure at 0x75d24db63be0>)
2: ('external_source', <function external_source at 0x75d24db69120>)
2: ('fish_eye', <function fish_eye at 0x75d24db63c70>)
2: ('flip', <function flip at 0x75d24db680d0>)
2: ('fog', <function fog at 0x75d24db63d00>)
2: ('gamma_correction', <function gamma_correction at 0x75d24db68160>)
2: ('hue', <function hue at 0x75d24db681f0>)
2: ('jitter', <function jitter at 0x75d24db68280>)
2: ('lens_correction', <function lens_correction at 0x75d24db63eb0>)
2: ('mel_filter_bank', <function mel_filter_bank at 0x75d24db696c0>)
2: ('nonsilent_region', <function nonsilent_region at 0x75d24db69510>)
2: ('nop', <function nop at 0x75d24db68ee0>)
2: ('normalize', <function normalize at 0x75d24db69630>)
2: ('one_hot', <function one_hot at 0x75d24db68d30>)
2: ('pixelate', <function pixelate at 0x75d24db68310>)
2: ('preemphasis_filter', <function preemphasis_filter at 0x75d24db691b0>)
2: ('rain', <function rain at 0x75d24db683a0>)
2: ('random_bbox_crop', <function random_bbox_crop at 0x75d24db68ca0>)
2: ('random_crop', <function random_crop at 0x75d24db68670>)
2: ('resample', <function resample at 0x75d24db69360>)
2: ('resize', <function resize at 0x75d24db68430>)
2: ('resize_crop', <function resize_crop at 0x75d24db68550>)
2: ('resize_crop_mirror', <function resize_crop_mirror at 0x75d24db684c0>)
2: ('resize_mirror_normalize', <function resize_mirror_normalize at 0x75d24db685e0>)
2: ('rotate', <function rotate at 0x75d24db68700>)
2: ('saturation', <function saturation at 0x75d24db68790>)
2: ('slice', <function slice at 0x75d24db695a0>)
2: ('snow', <function snow at 0x75d24db63b50>)
2: ('snp_noise', <function snp_noise at 0x75d24db69000>)
2: ('spectrogram', <function spectrogram at 0x75d24db69240>)
2: ('ssd_random_crop', <function ssd_random_crop at 0x75d24db68820>)
2: ('tensor_add_tensor_float', <function tensor_add_tensor_float at 0x75d24db693f0>)
2: ('tensor_mul_scalar_float', <function tensor_mul_scalar_float at 0x75d24db69480>)
2: ('to_decibels', <function to_decibels at 0x75d24db692d0>)
2: ('uniform', <function uniform at 0x75d24db68c10>)
2: ('vignette', <function vignette at 0x75d24db68940>)
2: ('warp_affine', <function warp_affine at 0x75d24db688b0>)
2/6 Test #2: rocal_pybind_test_functions ......   Passed    0.45 sec
test 3
    Start 3: rocal_pybind_test_pipelines

3: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/pipelines_test.py"
3: Working Directory: /root/rocal-pybind-test
3: Environment variables: 
3:  PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
3: Test timeout computed to be: 1500
3: rocAL PyBind Pipelines
3: ('__enter__', <function Pipeline.__enter__ at 0x77c100d55630>)
3: ('__exit__', <function Pipeline.__exit__ at 0x77c100d556c0>)
3: ('__init__', <function Pipeline.__init__ at 0x77c100d551b0>)
3: ('build', <function Pipeline.build at 0x77c100d55240>)
3: ('copyToExternalTensor', <function Pipeline.copyToExternalTensor at 0x77c100d55480>)
3: ('copy_encoded_boxes_and_lables', <function Pipeline.copy_encoded_boxes_and_lables at 0x77c100d55fc0>)
3: ('define_graph', <function Pipeline.define_graph at 0x77c100d55360>)
3: ('get_bounding_box_cords', <function Pipeline.get_bounding_box_cords at 0x77c100d55d80>)
3: ('get_bounding_box_count', <function Pipeline.get_bounding_box_count at 0x77c100d55c60>)
3: ('get_bounding_box_labels', <function Pipeline.get_bounding_box_labels at 0x77c100d55cf0>)
3: ('get_encoded_boxes_and_lables', <function Pipeline.get_encoded_boxes_and_lables at 0x77c100d56050>)
3: ('get_handle', <function Pipeline.get_handle at 0x77c100d553f0>)
3: ('get_image_id', <function Pipeline.get_image_id at 0x77c100d55bd0>)
3: ('get_image_labels', <function Pipeline.get_image_labels at 0x77c100d55f30>)
3: ('get_image_name', <function Pipeline.get_image_name at 0x77c100d55b40>)
3: ('get_image_name_length', <function Pipeline.get_image_name_length at 0x77c100d56200>)
3: ('get_img_sizes', <function Pipeline.get_img_sizes at 0x77c100d560e0>)
3: ('get_last_batch_padded_size', <function Pipeline.get_last_batch_padded_size at 0x77c100d56680>)
3: ('get_mask_coordinates', <function Pipeline.get_mask_coordinates at 0x77c100d55ea0>)
3: ('get_mask_count', <function Pipeline.get_mask_count at 0x77c100d55e10>)
3: ('get_matched_indices', <function Pipeline.get_matched_indices at 0x77c100d56560>)
3: ('get_one_hot_encoded_labels', <function Pipeline.get_one_hot_encoded_labels at 0x77c100d55510>)
3: ('get_output_tensors', <function Pipeline.get_output_tensors at 0x77c100d565f0>)
3: ('get_remaining_images', <function Pipeline.get_remaining_images at 0x77c100d56290>)
3: ('get_roi_img_sizes', <function Pipeline.get_roi_img_sizes at 0x77c100d56170>)
3: ('is_empty', <function Pipeline.is_empty at 0x77c100d56440>)
3: ('rocal_release', <function Pipeline.rocal_release at 0x77c100d56320>)
3: ('rocal_reset_loaders', <function Pipeline.rocal_reset_loaders at 0x77c100d563b0>)
3: ('rocal_run', <function Pipeline.rocal_run at 0x77c100d552d0>)
3: ('run', <function Pipeline.run at 0x77c100d56710>)
3: ('set_outputs', <function Pipeline.set_outputs at 0x77c100d555a0>)
3: ('set_seed', <function Pipeline.set_seed at 0x77c100d55750>)
3: ('timing_info', <function Pipeline.timing_info at 0x77c100d564d0>)
3/6 Test #3: rocal_pybind_test_pipelines ......   Passed    0.45 sec
test 4
    Start 4: rocal_pybind_test_randoms

4: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/randoms_test.py"
4: Working Directory: /root/rocal-pybind-test
4: Environment variables: 
4:  PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
4: Test timeout computed to be: 1500
4: rocAL PyBind Randoms
4: ('coin_flip', <function coin_flip at 0x7fb03dad3a30>)
4: ('normal', <function normal at 0x7fb03dad3ac0>)
4: ('uniform', <function uniform at 0x7fb03dad3b50>)
4/6 Test #4: rocal_pybind_test_randoms ........   Passed    0.45 sec
test 5
    Start 5: rocal_pybind_test_readers

5: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/readers_test.py"
5: Working Directory: /root/rocal-pybind-test
5: Environment variables: 
5:  PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
5: Test timeout computed to be: 1500
5: rocAL PyBind Readers
5: ('caffe', <function caffe at 0x79e2fd756b00>)
5: ('caffe2', <function caffe2 at 0x79e2fd756b90>)
5: ('coco', <function coco at 0x79e339143ac0>)
5: ('file', <function file at 0x79e339143be0>)
5: ('mxnet', <function mxnet at 0x79e2fd756dd0>)
5: ('sequence_reader', <function sequence_reader at 0x79e2fd756d40>)
5: ('tfrecord', <function tfrecord at 0x79e339143b50>)
5: ('video', <function video at 0x79e2fd756c20>)
5: ('video_resize', <function video_resize at 0x79e2fd756cb0>)
5/6 Test #5: rocal_pybind_test_readers ........   Passed    0.43 sec
test 6
    Start 6: rocal_pybind_test_types

6: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/types_test.py"
6: Working Directory: /root/rocal-pybind-test
6: Environment variables: 
6:  PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
6: Test timeout computed to be: 1500
6: rocAL PyBind Types
6: ('data_type_function', <function data_type_function at 0x734e148079a0>)
6/6 Test #6: rocal_pybind_test_types ..........   Passed    0.33 sec

100% tests passed, 0 tests failed out of 6

Total Test time (real) =   9.01 sec
SundarRajan28 commented 6 days ago

@kiritigowda The steps shared above doesn't work in pytorch dockers where there are multiple python versions and pytorch is installed in the conda python. The pybind tests works fine when they are run normally but when ctests are built using the CMake, it detects the system python version and thus it fails due to module missing error.

With python 3.10

root@x1000c4s5b1n0:/data/numpy_dev/rocAL/tests/pybind# python decoders_test.py
rocAL PyBind Decoders
('audio', <function audio at 0x7f37da0fa710>)
('image', <function image at 0x7f3c7e61b6d0>)
('image_random_crop', <function image_random_crop at 0x7f3c7e446050>)
('image_raw', <function image_raw at 0x7f3c7e664040>)
('image_slice', <function image_slice at 0x7f37da0fa680>)

With python 3.12

root@x1000c4s5b1n0:/data/numpy_dev/rocAL/tests/pybind# python3.12 decoders_test.py
Traceback (most recent call last):
  File "/data/numpy_dev/rocAL/tests/pybind/decoders_test.py", line 21, in <module>
    import amd.rocal.decoders as decoders
  File "/opt/rocm/lib/amd/rocal/decoders.py", line 26, in <module>
    import amd.rocal.types as types
  File "/opt/rocm/lib/amd/rocal/types.py", line 26, in <module>
    from rocal_pybind.types import OK
ModuleNotFoundError: No module named 'rocal_pybind'

With ctests

test 1
    Start 1: rocal_pybind_test_decoders

1: Test command: /opt/conda/bin/python3.12 "/opt/rocm/share/rocal/test/pybind/decoders_test.py"
1: Working Directory: /data/numpy_dev/rocAL/tests/pybind
1: Environment variables:
1:  PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
1: Test timeout computed to be: 1500
1: Traceback (most recent call last):
1:   File "/opt/rocm/share/rocal/test/pybind/decoders_test.py", line 21, in <module>
1:     import amd.rocal.decoders as decoders
1:   File "/opt/rocm/lib/amd/rocal/decoders.py", line 26, in <module>
1:     import amd.rocal.types as types
1:   File "/opt/rocm/lib/amd/rocal/types.py", line 26, in <module>
1:     from rocal_pybind.types import OK
1: ModuleNotFoundError: No module named 'rocal_pybind'
1/6 Test #1: rocal_pybind_test_decoders .......***Failed    0.02 sec
SundarRajan28 commented 6 days ago

@kiritigowda I've confirmed that the pybind ctests are working by manually changing the python version in the pybind tests CMake. If a flag is added like the PYTHON_VERSION_SUGGESTED, the user can pass the python version to run the ctests with.

#set(Python3_FIND_VIRTUALENV FIRST)
#find_package(Python3 QUIET)
find_package(Python3 "3.10" EXACT QUIET COMPONENTS Interpreter Development)

ctests results

100% tests passed, 0 tests failed out of 6
Total Test time (real) =   2.87 sec