Closed TApplencourt closed 8 months ago
Ready for your review @Kerilk . Less code than before, but still large. Sorry
It's more or less a 1 to 1 mapping of the old .sh
but in ruby.
I'm using Ruby's nice Logging
capability. Feedback on the usage of Open3
and stuff is appreciated.
We may need to add some teardown
and stuff to handle cases where the apps passed argument will crash.
Aren't you missing OpenMP, CUDA, and HIP support?
h = Hash.new { |h, k| h[k] = [] }
[%w[opencl cl libOpenCL libTracerOpenCL],
%w[ze ze libze_loader libTracerZE],
%w[cuda cuda libcuda libTracerCUDA],
%w[hip hip libamdhip64 libTracerHIP]].each do |name, bt_name, lib, libtracer|
Should be good! I tested ze
and cl
. (and OMP handled bellow in a special case)
I removed all the *prof
because now we can pass --backend
to the new iprof
to restrict with backend to trace.
Oh for the enable_events_*
, yeah I'm stupid indeed! Thanks!
Yeah, I could have been more clear, sorry about that.
Added support by d941
. Also, implemented a little optimization to enable events only for the backend where we found the libs.
Those failing have nothing to do with the PR. Will investigate. It looks like some issue in our testing framework.
The bug was likely triggered due to an update on one of the dependencies but existed since forever. Of course, I cannot reproduce on my machine...
First step would be archiving the logs after the run, but we already do for standard runs. For distcheck and dist and check, we would need to find the right folder
This is the error for reference:
+ BINDING_DIR=. DUST_MODELS_DIR=/home/runner/work/THAPI/THAPI/build/cuda/:/home/runner/work/THAPI/THAPI/build/../xprof/ BABELTRACE_PLUGIN_PATH=./.libs/ DUST_TRACE_DIR=/home/runner/work/THAPI/THAPI/build/../cuda/tests:/home/runner/work/THAPI/THAPI/build/cuda/tests ruby ../../utils/bt2.rb -f ./tests/interval_profiling_normal.dust
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/trace-ir/field.rb:18: [BUG] Segmentation fault at 0x0000000000000051
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]
-- Control frame information -----------------------------------------------
c:0014 p:---- s:0067 e:000066 CFUNC :bt_field_get_class_type
c:0013 p:0022 s:0062 e:000060 METHOD /var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/trace-ir/field.rb:18
c:0012 p:0037 s:0055 e:000054 METHOD /var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/trace-ir/event.rb:72
c:0011 p:0055 s:0050 e:000049 BLOCK /home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:32 [FINISH]
c:0010 p:---- s:0044 e:000043 CFUNC :each
c:0009 p:0012 s:0040 e:000039 BLOCK /home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:29 [FINISH]
c:0008 p:---- s:0035 e:000034 CFUNC :each
c:0007 p:0008 s:0031 e:000030 METHOD /home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:28 [FINISH]
c:0006 p:---- s:0026 e:000025 IFUNC
c:0005 p:0015 s:0023 e:000021 BLOCK /var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/graph/component-class-dev.rb:55 [FINISH]
c:0004 p:---- s:0017 e:000016 CFUNC :bt_graph_put_ref
c:0003 p:---- s:0014 e:000013 CFUNC :call
c:0002 p:0019 s:0009 e:000008 METHOD /var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/types.rb:587 [FINISH]
c:0001 p:0000 s:0003 E:000680 (none) [FINISH]
-- Ruby level backtrace information ----------------------------------------
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/types.rb:587:in `call'
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/types.rb:587:in `call'
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/types.rb:587:in `bt_graph_put_ref'
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/graph/component-class-dev.rb:55:in `block in _wrap_component_class_finalize_method'
/home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:28:in `finalize_method'
/home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:28:in `each'
/home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:29:in `block in finalize_method'
/home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:29:in `each'
/home/runner/work/THAPI/THAPI/utils/bt_plugins/comparator.rb:32:in `block (2 levels) in finalize_method'
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/trace-ir/event.rb:72:in `get_payload_field'
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/trace-ir/field.rb:18:in `from_handle'
/var/lib/gems/3.0.0/gems/babeltrace2-0.1.4/lib/babeltrace2/trace-ir/field.rb:18:in `bt_field_get_class_type'
so most probably a lifetime issue somewhere in our dust plugin...
Yep, I did that last time and sent it to you on slack. At least know we know it's the same error :)
Nice the tests are now passing! :D
Ah fuck, why the PR is so big now >< I screw up the rebase with master... Will fix
Thanks, Bryce is adding new MPI launcher support then we can merge!
Is it really a good idea to make this one bigger than it already is? Or is this one broken as is or not replacing the original application in any way?
It's just 3 new ENV to grab to allow Bryce to use MPI + CUDA on his box. But yeah we will stop here. Will add the fancy other MPI launcher latter.
Fixed a bug with traced-ranks
, and verified that the CUDA works. We can merge now.
Feebacks are welcome now before the PR starts begging too big...