Closed LakshmiKumar23 closed 1 week ago
@LakshmiKumar23 Tested the TF reader python examples and TF pets training example - works fine
@LakshmiKumar23 I used to mpirun to create multiple rocAL pipelines and passed different GPU IDs (0-7) in a 8 GPU system. I iterated through all the rocAL pipelines and printed the device of the tf.Tensors returned from TF iterator. Its printing the respective GPU ID that I passed to rocAL pipeline creation.
This confirms the multi GPU support added for TF iterator is working fine.
@kiritigowda make test passes on my system. Not sure why so many fail on Azure since my changes don't affect those tests. Can you please check?
lakshmi@kapu:~/work/lk/rocAL/build$ make test
Running tests...
Test project /home/lakshmi/work/lk/rocAL/build
Start 1: basic_test_cpu
1/14 Test #1: basic_test_cpu ..................... Passed 2.31 sec
Start 2: basic_test_gpu
2/14 Test #2: basic_test_gpu ..................... Passed 0.35 sec
Start 3: basic_test_gray
3/14 Test #3: basic_test_gray .................... Passed 0.89 sec
Start 4: basic_test_rgb
4/14 Test #4: basic_test_rgb ..................... Passed 0.77 sec
Start 5: dataloader_multithread_cpu
5/14 Test #5: dataloader_multithread_cpu ......... Passed 3.05 sec
Start 6: dataloader_multithread_gpu
6/14 Test #6: dataloader_multithread_gpu ......... Passed 1.98 sec
Start 7: performance_tests_cpu
7/14 Test #7: performance_tests_cpu .............. Passed 8.28 sec
Start 8: performance_tests_gpu
8/14 Test #8: performance_tests_gpu .............. Passed 6.66 sec
Start 9: performance_tests_with_depth_cpu
9/14 Test #9: performance_tests_with_depth_cpu ... Passed 3.43 sec
Start 10: performance_tests_with_depth_gpu
10/14 Test #10: performance_tests_with_depth_gpu ... Passed 1.75 sec
Start 11: unit_tests_cpu
11/14 Test #11: unit_tests_cpu ..................... Passed 14.44 sec
Start 12: unit_tests_gpu
12/14 Test #12: unit_tests_gpu ..................... Passed 12.92 sec
Start 13: unit_tests_gray
13/14 Test #13: unit_tests_gray .................... Passed 5.22 sec
Start 14: video_tests
14/14 Test #14: video_tests ........................ Passed 80.04 sec
100% tests passed, 0 tests failed out of 14
Total Test time (real) = 142.10 sec
@kiritigowda Also passes python tests:
lakshmi@kapu:~/work/lk/rocAL/build$ export PYTHONPATH=/opt/rocm/lib:$PYTHONPATH
lakshmi@kapu:~/work/lk/rocAL/build$ mkdir rocal-pybind-test && cd rocal-pybind-test
cmake /opt/rocm/share/rocal/test/pybind
ctest -VV
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- rocal-pybind-test: rocAL Pybind found at /opt/rocm/lib/amd/rocal
-- Configuring done
-- Generating done
-- Build files have been written to: /home/lakshmi/work/lk/rocAL/build/rocal-pybind-test
UpdateCTestConfiguration from :/home/lakshmi/work/lk/rocAL/build/rocal-pybind-test/DartConfiguration.tcl
Parse Config file:/home/lakshmi/work/lk/rocAL/build/rocal-pybind-test/DartConfiguration.tcl
UpdateCTestConfiguration from :/home/lakshmi/work/lk/rocAL/build/rocal-pybind-test/DartConfiguration.tcl
Parse Config file:/home/lakshmi/work/lk/rocAL/build/rocal-pybind-test/DartConfiguration.tcl
Test project /home/lakshmi/work/lk/rocAL/build/rocal-pybind-test
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
Start 1: rocal_pybind_test_decoders
1: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/"
1: Environment variables:
1: Test timeout computed to be: 1500
1: rocAL PyBind Decoders
1: ('audio', <function audio at 0x772782f66e60>)
1: ('image', <function image at 0x7727b3615000>)
1: ('image_random_crop', <function image_random_crop at 0x772782f66d40>)
1: ('image_raw', <function image_raw at 0x7727b36156c0>)
1: ('image_slice', <function image_slice at 0x772782f66dd0>)
1/6 Test #1: rocal_pybind_test_decoders ....... Passed 0.31 sec
test 2
Start 2: rocal_pybind_test_functions
2: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/"
2: Environment variables:
2: Test timeout computed to be: 1500
2: rocAL PyBind Functions
2: ('blend', <function blend at 0x737a4c619510>)
2: ('blur', <function blur at 0x737a1bf7c5e0>)
2: ('box_encoder', <function box_encoder at 0x737a1bf7d3f0>)
2: ('box_iou_matcher', <function box_iou_matcher at 0x737a1bf7d6c0>)
2: ('brightness', <function brightness at 0x737a1bf7c430>)
2: ('brightness_fixed', <function brightness_fixed at 0x737a1bf7c4c0>)
2: ('center_crop', <function center_crop at 0x737a1bf7d090>)
2: ('color_temp', <function color_temp at 0x737a1bf7d480>)
2: ('color_twist', <function color_twist at 0x737a1bf7d1b0>)
2: ('contrast', <function contrast at 0x737a1bf7c670>)
2: ('copy', <function copy at 0x737a1bf7d5a0>)
2: ('crop', <function crop at 0x737a1bf7d120>)
2: ('crop_mirror_normalize', <function crop_mirror_normalize at 0x737a1bf7d000>)
2: ('exposure', <function exposure at 0x737a1bf7c280>)
2: ('external_source', <function external_source at 0x737a1bf7d750>)
2: ('fish_eye', <function fish_eye at 0x737a1bf7c310>)
2: ('flip', <function flip at 0x737a1bf7c700>)
2: ('fog', <function fog at 0x737a1bf7c3a0>)
2: ('gamma_correction', <function gamma_correction at 0x737a1bf7c790>)
2: ('hue', <function hue at 0x737a1bf7c820>)
2: ('jitter', <function jitter at 0x737a1bf7c8b0>)
2: ('lens_correction', <function lens_correction at 0x737a1bf7c550>)
2: ('mel_filter_bank', <function mel_filter_bank at 0x737a1bf7dcf0>)
2: ('nonsilent_region', <function nonsilent_region at 0x737a1bf7db40>)
2: ('nop', <function nop at 0x737a1bf7d510>)
2: ('normalize', <function normalize at 0x737a1bf7dc60>)
2: ('one_hot', <function one_hot at 0x737a1bf7d360>)
2: ('pixelate', <function pixelate at 0x737a1bf7c940>)
2: ('preemphasis_filter', <function preemphasis_filter at 0x737a1bf7d7e0>)
2: ('rain', <function rain at 0x737a1bf7c9d0>)
2: ('random_bbox_crop', <function random_bbox_crop at 0x737a1bf7d2d0>)
2: ('random_crop', <function random_crop at 0x737a1bf7cca0>)
2: ('resample', <function resample at 0x737a1bf7d990>)
2: ('resize', <function resize at 0x737a1bf7ca60>)
2: ('resize_crop', <function resize_crop at 0x737a1bf7cb80>)
2: ('resize_crop_mirror', <function resize_crop_mirror at 0x737a1bf7caf0>)
2: ('resize_mirror_normalize', <function resize_mirror_normalize at 0x737a1bf7cc10>)
2: ('rotate', <function rotate at 0x737a1bf7cd30>)
2: ('saturation', <function saturation at 0x737a1bf7cdc0>)
2: ('slice', <function slice at 0x737a1bf7dbd0>)
2: ('snow', <function snow at 0x737a1bf7c1f0>)
2: ('snp_noise', <function snp_noise at 0x737a1bf7d630>)
2: ('spectrogram', <function spectrogram at 0x737a1bf7d870>)
2: ('ssd_random_crop', <function ssd_random_crop at 0x737a1bf7ce50>)
2: ('tensor_add_tensor_float', <function tensor_add_tensor_float at 0x737a1bf7da20>)
2: ('tensor_mul_scalar_float', <function tensor_mul_scalar_float at 0x737a1bf7dab0>)
2: ('to_decibels', <function to_decibels at 0x737a1bf7d900>)
2: ('uniform', <function uniform at 0x737a1bf7d240>)
2: ('vignette', <function vignette at 0x737a1bf7cf70>)
2: ('warp_affine', <function warp_affine at 0x737a1bf7cee0>)
2/6 Test #2: rocal_pybind_test_functions ...... Passed 0.31 sec
test 3
Start 3: rocal_pybind_test_pipelines
3: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/"
3: Environment variables:
3: Test timeout computed to be: 1500
3: rocAL PyBind Pipelines
3: ('__enter__', <function Pipeline.__enter__ at 0x79ed1ac61750>)
3: ('__exit__', <function Pipeline.__exit__ at 0x79ed1ac617e0>)
3: ('__init__', <function Pipeline.__init__ at 0x79ed1adfb1c0>)
3: ('build', <function at 0x79ed1ac61360>)
3: ('copyToExternalTensor', <function Pipeline.copyToExternalTensor at 0x79ed1ac615a0>)
3: ('copy_encoded_boxes_and_lables', <function Pipeline.copy_encoded_boxes_and_lables at 0x79ed1ac620e0>)
3: ('define_graph', <function Pipeline.define_graph at 0x79ed1ac61480>)
3: ('get_bounding_box_cords', <function Pipeline.get_bounding_box_cords at 0x79ed1ac61ea0>)
3: ('get_bounding_box_count', <function Pipeline.get_bounding_box_count at 0x79ed1ac61d80>)
3: ('get_bounding_box_labels', <function Pipeline.get_bounding_box_labels at 0x79ed1ac61e10>)
3: ('get_encoded_boxes_and_lables', <function Pipeline.get_encoded_boxes_and_lables at 0x79ed1ac62170>)
3: ('get_handle', <function Pipeline.get_handle at 0x79ed1ac61510>)
3: ('get_image_id', <function Pipeline.get_image_id at 0x79ed1ac61cf0>)
3: ('get_image_labels', <function Pipeline.get_image_labels at 0x79ed1ac62050>)
3: ('get_image_name', <function Pipeline.get_image_name at 0x79ed1ac61c60>)
3: ('get_image_name_length', <function Pipeline.get_image_name_length at 0x79ed1ac62320>)
3: ('get_img_sizes', <function Pipeline.get_img_sizes at 0x79ed1ac62200>)
3: ('get_last_batch_padded_size', <function Pipeline.get_last_batch_padded_size at 0x79ed1ac627a0>)
3: ('get_mask_coordinates', <function Pipeline.get_mask_coordinates at 0x79ed1ac61fc0>)
3: ('get_mask_count', <function Pipeline.get_mask_count at 0x79ed1ac61f30>)
3: ('get_matched_indices', <function Pipeline.get_matched_indices at 0x79ed1ac62680>)
3: ('get_one_hot_encoded_labels', <function Pipeline.get_one_hot_encoded_labels at 0x79ed1ac61630>)
3: ('get_output_tensors', <function Pipeline.get_output_tensors at 0x79ed1ac62710>)
3: ('get_remaining_images', <function Pipeline.get_remaining_images at 0x79ed1ac623b0>)
3: ('get_roi_img_sizes', <function Pipeline.get_roi_img_sizes at 0x79ed1ac62290>)
3: ('is_empty', <function Pipeline.is_empty at 0x79ed1ac62560>)
3: ('rocal_release', <function Pipeline.rocal_release at 0x79ed1ac62440>)
3: ('rocal_reset_loaders', <function Pipeline.rocal_reset_loaders at 0x79ed1ac624d0>)
3: ('rocal_run', <function Pipeline.rocal_run at 0x79ed1ac613f0>)
3: ('run', <function at 0x79ed1ac62830>)
3: ('set_outputs', <function Pipeline.set_outputs at 0x79ed1ac616c0>)
3: ('set_seed', <function Pipeline.set_seed at 0x79ed1ac61870>)
3: ('timing_info', <function Pipeline.timing_info at 0x79ed1ac625f0>)
3/6 Test #3: rocal_pybind_test_pipelines ...... Passed 0.29 sec
test 4
Start 4: rocal_pybind_test_randoms
4: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/"
4: Environment variables:
4: Test timeout computed to be: 1500
4: rocAL PyBind Randoms
4: ('coin_flip', <function coin_flip at 0x7d0c4d3b24d0>)
4: ('normal', <function normal at 0x7d0c1cc62a70>)
4: ('uniform', <function uniform at 0x7d0c4d3b2830>)
4/6 Test #4: rocal_pybind_test_randoms ........ Passed 0.30 sec
test 5
Start 5: rocal_pybind_test_readers
5: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/"
5: Environment variables:
5: Test timeout computed to be: 1500
5: rocAL PyBind Readers
5: ('caffe', <function caffe at 0x78b7b6366c20>)
5: ('caffe2', <function caffe2 at 0x78b7b6366cb0>)
5: ('coco', <function coco at 0x78b7e6aae4d0>)
5: ('file', <function file at 0x78b7e6aae8c0>)
5: ('mxnet', <function mxnet at 0x78b7b6366ef0>)
5: ('sequence_reader', <function sequence_reader at 0x78b7b6366e60>)
5: ('tfrecord', <function tfrecord at 0x78b7b6366b90>)
5: ('video', <function video at 0x78b7b6366d40>)
5: ('video_resize', <function video_resize at 0x78b7b6366dd0>)
5/6 Test #5: rocal_pybind_test_readers ........ Passed 0.30 sec
test 6
Start 6: rocal_pybind_test_types
6: Test command: /usr/bin/python3.10 "/opt/rocm/share/rocal/test/pybind/"
6: Environment variables:
6: Test timeout computed to be: 1500
6: rocAL PyBind Types
6: ('data_type_function', <function data_type_function at 0x7b4e82cae3b0>)
6/6 Test #6: rocal_pybind_test_types .......... Passed 0.22 sec
100% tests passed, 0 tests failed out of 6
Total Test time (real) = 1.73 sec
@kiritigowda can we merge this? CI has issues
@SundarRajan28 please test all TF examples/notebooks/tests Here are the results from my runs: