Closed cwschilly closed 5 months ago
Hi @cwschilly, thanks for the report. I've seen this happen if an MPI call internally invokes other MPI API functions, which they're not really supposed to do. Which MPI library version are you using?
I can work around it but it'll require a fix in Caliper.
Thanks for the response @daboehme, I'm using OpenMPI (v4.1.5).
Hi @daboehme , I also get this error when I use runtime-report; any idea what might cause this?
cali-query: TreeFormatter: Attribute "min#sum#sum#time.duration" not found.
cali-query: TreeFormatter: Attribute "max#sum#sum#time.duration" not found.
cali-query: TreeFormatter: Attribute "avg#sum#sum#time.duration" not found.
cali-query: TreeFormatter: Attribute "percent_total#sum#sum#time.duration" not found.
Hm that's interesting. Is this with the mpi.message.count
option? If so can you try it without? Otherwise what Caliper version are you using?
@daboehme No, it's with no options selected: export CALI_CONFIG=runtime-report
.
I'm using Caliper v2.9.1. I tried with 2.10 but all the same warnings are present.
Thanks @daboehme, that resolved my MPI issues.
Thanks @daboehme, that resolved my MPI issues.
Great, thanks for letting me know! Did it fix the TreeFormatter errors as well? These are a bit strange. I think this could happen if either there is no data at all or some weird Caliper version mismatch.
@daboehme The TreeFormatter errors are still there.
@daboehme The TreeFormatter errors are still there.
Hm, okay. Do you get any output at all? Can you run with CALI_LOG_VERBOSITY=1
set and see what that reports? It should be saying something like == CALIPER: runtime-report: Aggregate: flushed <N> snapshots
. If that's 0 it then it somehow didn't process through any regions annotations at runtime.
Do you get any output at all?
No output other than the TreeFormatter error.
Can you run with
CALI_LOG_VERBOSITY=1
set and see what that reports? It should be saying something like== CALIPER: runtime-report: Aggregate: flushed <N> snapshots
. If that's 0 it then it somehow didn't process through any regions annotations at runtime.
You're right, it shows == CALIPER: (0): runtime-report: Aggregate: flushed 0 snapshots.
Here is all Caliper-related output:
== CALIPER: (0): Initialized
== CALIPER: (0): Creating channel default
== CALIPER: (0): default: No services enabled, default channel will not record data.
== CALIPER: (0): Creating channel builtin.configmgr
== CALIPER: (0): builtin.configmgr: Registered MPI service
== CALIPER: (0): builtin.configmgr: Registered mpiflush service
== CALIPER: (0): Creating channel runtime-report
== CALIPER: (0): runtime-report: Registered aggregation service
== CALIPER: (0): runtime-report: Registered event trigger service
== CALIPER: (0): runtime-report: Registered MPI service
== CALIPER: (0): runtime-report: Registered mpireport service
== CALIPER: (0): runtime-report: Registered timer service
== CALIPER: (0): Registered builtin ConfigManager
---- runs the application ----
== CALIPER: (0): builtin.configmgr: Flushing Caliper data
== CALIPER: (0): runtime-report: Flushing Caliper data
== CALIPER: (0): runtime-report: Aggregate: flushed 0 snapshots.
cali-query: TreeFormatter: Attribute "min#sum#sum#time.duration" not found.
cali-query: TreeFormatter: Attribute "max#sum#sum#time.duration" not found.
cali-query: TreeFormatter: Attribute "avg#sum#sum#time.duration" not found.
cali-query: TreeFormatter: Attribute "percent_total#sum#sum#time.duration" not found.
Path
== CALIPER: (0): Finalizing ...
== CALIPER: (0): default: Flushing Caliper data
== CALIPER: (0): Releasing channel default
== CALIPER: (0): builtin.configmgr: Flushing Caliper data
== CALIPER: (0): Releasing channel builtin.configmgr
== CALIPER: (0): Releasing channel runtime-report
== CALIPER: (0): Finished
This is with export CALI_CONFIG=runtime-report
.
Hm, that's interesting. Looks like it's simply not capturing anything at runtime. Are there any Caliper annotations in the code that you'd expect to see? If there are MPI calls you can try runtime-report,profile.mpi
again so at least they should show up.
Ah, probably a misunderstanding on my part as to what runtime-report
captures by default. I haven't added any specific notations, so it makes sense that there's nothing to capture. profile.mpi
and profile.kokkos
both work perfectly. Thanks for your help!
Hello,
I am trying to get a basic count of the MPI calls in an app that uses Kokkos.
Since the app uses Kokkos, I set the
KOKKOS_TOOLS_LIBS
env variable to point tolibcaliper.so
and did not add any additional Caliper annotations to the source code.To get the MPI counts, I set the
CALI_CONFIG
env variable tompi-report
. I have also tried settingCALI_CONFIG="runtime-report(mpi.message.count)"
.When I run the app with either of these configs, I get these warnings (one per process):
Caliper still outputs a report at the end of the run, but the MPI counts are not complete (I know that there should be more counts than are present). I suspect that this
Region stack mismatch
causes Caliper to not be able to profile in some regions.In what contexts does this warning show up? And how can I resolve it?