Open derkling opened 7 years ago
@sinkap also https://github.com/ARM-software/lisa/pull/418 can help here. Since I am told it is quite fast, I am also not very against making a temporary solution with that PR while the more advanced grammar-based approach is developed (and benchmarked against https://github.com/ARM-software/lisa/pull/418)
I implemented this and gave @derkling 's notebook a try: https://gist.github.com/joelagnel/cc08ba964e40467e828741c691011ffc
It works great but is a bit hackish. I'll post the PR in a bit
In many analysis it happens that we are interested in joining information coming from different DF. For example, let say we have a trace like this:
Currently we can easily build two DF, one for sched_load_avg_cpu and another for sched_contrib_scale_f.
However, in some analysis it could be useful and correlate the information from these two events, thus getting a single DF where we see a consistent view of the most updated information from both.
In these cases we have a "master_df", e.g. sched_load_avg_cpu, where we want to propagate into the information from a "secondary_df", e.g. sched_contrib_scale_f.
This would require to:
Join the master_df with the secondary_df
Fix any index collision eventually happening, for example in the previous small trace we can see that at the exact time 2943.184105 we have one event for both master_df and secondary_df on each CPU.
A join of these two DF should grant that:
Than we need to:
forward propagate each secondary_df columns by considering the value of a "pivot" column which is shared among the two DFs, for example the value
cpu
can be used to forward propagate the otherssched_contrib_scale_f
columns (i.e.freq_scale_factor
andcpu_scale_factor
) in thesched_load_avg_cpu
rowsremove all the secondary_df rows which values have been already properly propagated in the following primary_df rows
All these operations together should be supported by a new generic convenience API which, once called with something like:
Where, primary_df is:
and secondary_df is:
should returns a single DF which is:
Here is a notebook to play with the same example: https://gist.github.com/derkling/786e911ae01ca170377e1893d6696384 where we can see that the current join API needs to be extended to get the exact result we described before.