Closed SimeonEhrig closed 2 years ago
ping @afif-ishamsyah
I profile the main.py
width only the function example_context_copy
(example_manual_copy
was commented).
I recognized, that are 7 hipMemcpy
was applied, but only 5 was expected (2 for the function open_sync
and one for the function compute). Can you please find out, how is causing the 2 extra mem copies. In the end of the comment, you find a manual to trace the application with rocprof
.
This causes also the idea to add new context manager functions. Can you please rename sync_open
to sync_open_rw
, create the functions sync_open_r
(reads only data) and sync_open_w
(writes only data) and used it in the main.py
to reduce memory operations.
example_manual_copy
rocprof --hip-trace -o binding_trace.csv python src/main.py
hipMemcpy
was executedcat binding_trace.hip_stats.csv
shows how many times hipMemcpy
was executed.hipMemcpy
was executedbinding_trace.json
to your computerbinding_trace.json
2 extra hipMemcpy comes from algo.initialize_array function. It create 2 hip_mem object, each containing an array of zeros. 1 hip_mem will be used as input object, the other one is for output object. Creating a hip_mem object automatically run a hipMemcpy. It is on algo.hpp, line 166.
Similar to the define
ENABLE_CUDA
, there is a defineENABLE_HIP
if hip is enabled.