Open Quuxplusone opened 4 years ago
Attached slow-present-table.c
(4237 bytes, text/x-csrc): The benchmark that reveals slow present table performance
Was this fixed/impacted by https://reviews.llvm.org/D82264 at all?
(In reply to Johannes Doerfert from comment #1)
> Was this fixed/impacted by https://reviews.llvm.org/D82264 at all?
No. I just tested LLVM/Clang from 7 days ago: clang version 11.0.0
(https://github.com/llvm/llvm-project.git
469da663f2df150629786df3f82c217062924f5e).
When the present table is large, the performance of "pragma omp target update
from" is bad. In the test program attached to the bug report, the poor
effective bandwidth configuration is "./slow-present-table 100 10000 100 1"
All of the time is lost in std::_Rb_tree_increment. The callpath that I see
from my profiler shows "__tgt_target_data_update" ->
"target_data_update(DeviceTy&, int, void**, void**, long*, long*)" ->
"std::_Rb_tree_increment(std::Rb_tree_node_base*)".
Thanks,
Chris
slow-present-table.c
(4237 bytes, text/x-csrc)