Custom loop profile - Githubissues

Hi,

I am trying to benchmark adaptive finite element simulations using Caliper and I am super stuck in finding the correct configuration for caliper. Since I am cycling between the documentation page and permutating environment variable combinations for the last 3 days without any progress I am asking here for help.

Basically what I want is

the output of loop-report
separate output per mpi rank
inclusive, aggregated region for some selected regions
information about waiting processes
additional metadata per loop (e.g. number of elements on current rank)

On a very high level my program looks like this

...
CALI_CXX_MARK_LOOP_BEGIN(loop_ann_outer, "Time Loop");
for (auto t = 0.0; t < t_final; t += Δt) {
    timestep_index++;
    CALI_CXX_MARK_LOOP_ITERATION(loop_ann_outer, timestep_index);

    CALI_MARK_BEGIN("AMR");
    ...
       CALI_MARK_BEGIN("Refinement");
       ...
       CALI_MARK_END("Refinement");
       CALI_MARK_BEGIN("Derefinement");
       ...
       CALI_MARK_END("Derefinement");
    ...
    CALI_MARK_END("AMR");
    {
      cali::Annotation::Guard g( cali::Annotation("num_elements").set(num_elements_local) );
    }

    CALI_MARK_BEGIN("Prepare Update");
    ...
    CALI_MARK_END("Prepare Update");

    CALI_CXX_MARK_LOOP_BEGIN(loop_ann_inner, "Update Loop");
    for (...) {
        CALI_CXX_MARK_LOOP_ITERATION(loop_ann_inner, ...);
        ...

        CALI_MARK_BEGIN("Halo Exchange");
        ...
        CALI_MARK_END("Halo Exchange");
    }
    CALI_CXX_MARK_LOOP_END(loop_ann_inner);

    {
      cali::Annotation::Guard g( cali::Annotation("num_inner_steps").set(num_inner_steps_local) );
    }
}
CALI_CXX_MARK_LOOP_END(loop_ann_outer);
...

To be specific, I want to generate a time series with time spent in MPI_Waitall+selected regions+total time+the 2 annotations per iteration in "Time Loop" to investigate how load imbalanced evolve for different load balancing strategies and numbers of processes. So my question is: How can this be achieved with Caliper? I am also happy with some external example from which I can start or the docs page, in case I missed something here.

Also related to this, is it possible that the docs are out of date? I could not really figure out where the code for the example here http://software.llnl.gov/Caliper/services.html#example can be found.

What I tried so far

My first try was to just write the raw data and use cali-query to bring it into the correct shape. With this I almost succeeded, but hit hard drive limitations very fast (since I could not figure out how to filter the event traces correctly) and I could not get the exact caliper query. Here is what I tried to generate the data

CALI_SERVICES_ENABLE=mpi:event:trace:report
for NP in 1 2 4 8 16 32 64
do
    CALI_CONFIG=event-trace\(trace.mpi,output="$NP/performance-report-%mpi.rank%.cali"\) mpirun -np $NP executable ...
done

and for the query

"SELECT *,inclusive_sum(time.duration.ns) FORMAT json(human) GROUP BY \"iteration#Time Loop\",region,mpi.rank ORDER BY \"iteration#Time Loop\""

My second attempt was to generate the required data in-situ. Here I first tried to do it via the aggregation service via

export CALI_LOG_VERBOSITY=2
export CALI_SERVICES_ENABLE=event,trace,timestamp,recorder,aggregate,report,mpi,debug
export CALI_AGGREGATE_ATTRIBUTES="???"
export CALI_AGGREGATE_KEY=???
for NP in 1 2 4 8 16 32 64
do
    export CALI_REPORT_FILENAME="$NP/performance-report-%mpi.rank%.cali"
    mpirun -np $NP executable ...
done

here no matter what I have put into CALI_AGGREGATE_ATTRIBUTES and CALI_AGGREGATE_KEY I could not get anything meaningful. Furthermore, I am not understanding at all what I am doing wrong here and could not really deduce it from the docs, because the output is faulty in any case (the number of output columns change with each iteration and the data starts to interleave). I have just updated to master and can reproduce this.

My latest idea was to make a custom loop-reporter, because it is closest to what I want. However, I was really not sure where I should even start after copy pasting LoopReportController. I also could not find how to extend the output of the loop controller from command line, or even just redirect the output to some specific file.

export CALI_LOG_VERBOSITY=10
export CALI_SERVICES_ENABLE=mpi,debug
for NP in 1 2 4 8 16 32 64
do
    export CALI_REPORT_FILENAME="$NP/performance-report-%mpi.rank%.cali"
    CALI_CONFIG=loop-report,iteration_interval=1,timeseries.maxrows=0 mpirun -np $NP ...
done

Thanks in advance, Dennis

Hi @termi-official , apologies for the delay.

I think what you want should be possible with Caliper, but it may require some custom configuration and queries.

Let's start with the custom num_elements and num_inner_steps annotations. The way you have it Caliper will create a single record with that information once in each iteration, but not associate it with any of the other regions in the loop. They should start right at the top of the loop like so:

CALI_CXX_MARK_LOOP_BEGIN(loop_ann_outer, "Time Loop");
for (auto t = 0.0; t < t_final; t += Δt) {
    timestep_index++;
    cali::Annotation::Guard 
       g( cali::Annotation("num_elements", CALI_ATTR_SKIP_EVENTS).begin(num_elements_local) );

    CALI_CXX_MARK_LOOP_ITERATION(loop_ann_outer, timestep_index);
    //...

    cali::Annotation steps_ann("num_inner_steps", CALI_ATTR_SKIP_EVENTS)
    steps_ann.begin(num_inner_steps_local);
    CALI_CXX_MARK_LOOP_BEGIN(loop_ann_inner, "Update Loop");
    for (...) {
        CALI_CXX_MARK_LOOP_ITERATION(loop_ann_inner, ...);
        ...
    }
    CALI_CXX_MARK_LOOP_END(loop_ann_inner);
    steps_ann.end();
}

If you know these ahead of time you can also put them outside of the loop entirely. The CALI_ATTR_SKIP_EVENTS flag is useful if these annotations just provide additional information and you don't actually want to measure the time for its begin/end region.

I think the best strategy here is to collect a full profile into a .cali file and run queries on it. Once we have the queries figured out we can create a custom config to run the query online and produce text or json directly.

The config to collect a full profile should look something like this:

CALI_SERVICES_ENABLE=event,mpi,aggregate,timer,recorder
CALI_EVENT_ENABLE_SNAPSHOT_INFO=false
CALI_AGGREGATE_KEY="*,iteration#Time\ Loop"
CALI_MPI_WHITELIST=MPI_Waitall
CALI_RECORDER_FILENAME="report-%mpi.rank%.cali"

The CALI_EVENT_ENABLE_SNAPSHOT_INFO=false will disable explicit region begin/end attributes, which are likely just getting in the way for what you want. The CALI_AGGREGATE_KEY field is probably the most obtuse one. It's essentially a "group by". The * includes everything except by-value entries. The iteration attributes are by-value entries, so we'll have to add them explicitly. The example above will "group by" everything including the outer loop iteration, so you'll get a time series for the outer loop. Everything in the update loop will get aggregated. If you also need to distinguish the update loop iterations, include it in the aggregate key. At that point you might as well record a trace though, unless there are more nested loops with MPI functions or Caliper regions. Don't forget to set either CALI_MPI_WHITELIST or CALI_MPI_BLACKLIST if you want to time MPI functions.

This should produce a .cali file, and you can run cali-query --table or cali-query --tree to see what's in it. It should contain all the information we need, i.e. the regions, MPI functions, your custom annotations, and the loop iterations. From there we can narrow things down with queries. The CalQL documentation https://software.llnl.gov/Caliper/calql.html might be useful for writing those. Also, you can see all the attribute keys in the file with cali-query --list-attributes -t. Maybe you can play around with that. If you have an example for what kind of output you want to see exactly I'm happy to help designing those queries. The queries to generate the loop report for example certainly have some quirky stuff.

Thanks for the detailed response David. This clears up some of my questions. I could also track down that the file size literally exploded without setting CALI_MPI_WHITELIST/CALI_MPI_BLACKLIST . Also the pointer to CALI_EVENT_ENABLE_SNAPSHOT_INFO is another thing I missed somehow.

For the number of elements, I do not know the number ahead of time as it is dynamically determined through an error estimation procedure.

I have a first workflow where I first use cali-query to generate a table which I then filter with some scripts. I will definitely report back with some examples and will try myself with the new information here first.

Btw, is it intended that the loop-report does not "see" the mpi.rank variable? It gets replaced with an empty string for me on the current master.

Yes, the loop-report config unfortunately doesn't recognize the `%mpi.rank% variable. In fact it currently doesn't have a flag to split output per rank at all right now. It should be possible to write a query to produce similar output though.

LLNL / Caliper

Custom loop profile #521

What I tried so far