Closed kostrzewa closed 2 years ago
I'm working on resolving the conflicts and adjusting some of the output, I hope that's okay.
@Marcogarofalo after some thought and some exploration of the data.tree
package I've made a change which may or may not be useful. It significantly worsens the "human readability" of the output but it allows the output to be fed into a tree-based data structure without any additional logic. In the analysis one can do:
raw_data <- system(paste("grep \"Time for\"",
args[1],
"| grep level",
"| awk '{print $2 \" \" $5 \" \" $6 \" \" $9 \" \" $12}'"),
intern = TRUE)
data <- read.table(text = raw_data, stringsAsFactors = FALSE)
colnames(data) <- c("prefix", "name", "time", "level", "pathString")
data <- dplyr::mutate(data, pathString = sprintf("HMC%s", pathString)) %>%
dplyr::group_by(pathString) %>%
dplyr::summarise(time = sum(time),
prefix = unique(prefix),
name = unique(name),
level = unique(level)) %>%
dplyr::ungroup()
t_tree <- data.tree::as.Node(data)
which produces the following kind of representation:
> print(t_tree, "time", "prefix")
levelName time prefix
1 HMC NA
2 ¦--cloverdetlight:cloverdet_heatbath 0.276630200 :
3 ¦ ¦--Qp 0.039110120 :
4 ¦ ¦--random_energy0 0.066529490 :
5 ¦ ¦--sw_invert 0.069037080 :
6 ¦ °--sw_term 0.101852300 :
7 ¦--cloverdetratio1light:cloverdetratio_heatbath 34.091170000 :
8 ¦ ¦--Qp_zero_pf 0.031062650 :
9 ¦ ¦--random_energy0 0.069898270 :
10 ¦ ¦--solve_degenerate 33.826040000 :
11 ¦ ¦ ¦--gamma5 0.000388851 :
12 ¦ ¦ °--invert_eo_degenerate_quda 33.825010000 TM_QUDA:
13 ¦ ¦ ¦--invertQuda 13.625890000 TM_QUDA:
14 ¦ ¦ ¦--MG_Preconditioner_Setup 20.195500000 TM_QUDA:
15 ¦ ¦ ¦--reorder_spinor_eo_fromQuda 0.001913765 TM_QUDA:
16 ¦ ¦ °--reorder_spinor_eo_toQuda 0.001588084 TM_QUDA:
17 ¦ ¦--sw_invert 0.063206370 :
18 ¦ °--sw_term 0.100872100 :
19 ¦--cloverdetratio2light:cloverdetratio_heatbath 2.513363000 :
20 ¦ ¦--Qp_zero_pf 0.036231330 :
21 ¦ ¦--random_energy0 0.067319410 :
22 ¦ ¦--solve_degenerate 2.242713000 :
23 ¦ ¦ ¦--gamma5 0.000419401 :
24 ¦ ¦ °--invert_eo_degenerate_quda 2.241631000 TM_QUDA:
25 ¦ ¦ ¦--invertQuda 2.238565000 TM_QUDA:
26 ¦ ¦ ¦--reorder_spinor_eo_fromQuda 0.002052576 TM_QUDA:
27 ¦ ¦ °--reorder_spinor_eo_toQuda 0.000918682 TM_QUDA:
28 ¦ ¦--sw_invert 0.064552320 :
29 ¦ °--sw_term 0.102456200 :
30 ¦--CLOVERNDTRLOG:clovernd_trlog_heatbath 0.127413100 :
31 ¦ ¦--sw_term 0.102629100 :
32 ¦ °--sw_trace_nd 0.024746180 :
33 ¦--CLOVERTRLOG:clover_trlog_heatbath 0.120279500 :
34 ¦ ¦--sw_term 0.103729000 :
35 ¦ °--sw_trace 0.013745540 :
36 ¦--GAUGE:gauge_heatbath 0.067569350 :
37 ¦--ndcloverrat_0_3:init_ndrat_monomial 0.000018670 :
38 ¦--ndcloverrat_0_3:ndrat_heatbath 2.065281000 :
39 ¦ ¦--random_energy0 0.135540100 :
40 ¦ ¦--solve_mms_nd_plus 1.748963000 :
41 ¦ ¦ ¦--f_assign 0.252822900 :
42 ¦ ¦ °--solve_mms_nd 1.496015000 :
43 ¦ ¦ ¦--gamma5 0.002436816 :
44 ¦ ¦ ¦--invert_eo_quda_twoflavour_mshift 1.487874000 TM_QUDA:
45 ¦ ¦ ¦ ¦--invertMultiShiftQuda 0.416139500 TM_QUDA:
46 ¦ ¦ ¦ ¦--loadCloverQuda 0.021220480 TM_QUDA:
47 ¦ ¦ ¦ ¦--loadGaugeQuda 0.017785190 TM_QUDA:
48 ¦ ¦ ¦ ¦--reorder_gauge_toQuda 0.010681870 TM_QUDA:
49 ¦ ¦ ¦ ¦--reorder_spinor_eo_fromQuda 0.015871140 TM_QUDA:
50 ¦ ¦ ¦ ¦--reorder_spinor_eo_toQuda 0.004953463 TM_QUDA:
51 ¦ ¦ ¦ ¦--twoflavour_input_overhead 0.003285359 TM_QUDA:
52 ¦ ¦ ¦ °--twoflavour_output_overhead 0.011988443 TM_QUDA:
53 ¦ ¦ °--mul_r_gamma5 0.004574722 :
54 ¦ ¦--sw_invert_nd 0.043876160 :
55 ¦ °--sw_term 0.100460900 :
56 ¦--ndcloverrat_4_6:init_ndrat_monomial 0.000009320 :
57 ¦--ndcloverrat_4_6:ndrat_heatbath 2.265935000 :
58 ¦ ¦--random_energy0 0.132979700 :
59 ¦ ¦--solve_mms_nd_plus 1.955709000 :
60 ¦ ¦ ¦--f_assign 0.204246400 :
61 ¦ ¦ °--solve_mms_nd 1.748212000 :
62 ¦ ¦ ¦--gamma5 0.002139136 :
63 ¦ ¦ ¦--invert_eo_quda_twoflavour_mshift 1.742732000 TM_QUDA:
64 ¦ ¦ ¦ ¦--invertMultiShiftQuda 1.715788000 TM_QUDA:
65 ¦ ¦ ¦ ¦--reorder_spinor_eo_fromQuda 0.012970324 TM_QUDA:
66 ¦ ¦ ¦ ¦--reorder_spinor_eo_toQuda 0.003167139 TM_QUDA:
67 ¦ ¦ ¦ ¦--twoflavour_input_overhead 0.001384784 TM_QUDA:
68 ¦ ¦ ¦ °--twoflavour_output_overhead 0.005865898 TM_QUDA:
69 ¦ ¦ °--mul_r_gamma5 0.003186799 :
70 ¦ ¦--sw_invert_nd 0.043966710 :
71 ¦ °--sw_term 0.108778100 :
which can in turn be rather easily plotted as a multi-level barplot (I'll prepare an actual example hopefully by the end of the day). The idea is to be able to plot something like:
What do you think?
I should be able to produce your plots already because I had all the information in the structure I create in my script. However, your approach should solve the problem I had in examining logs of killed jobs and can easily print the tree in a readable format.
Thanks a lot for all the help @Marcogarofalo. Based on this and the "new" timing infrastructure, I've introduced the hmc_mk2
profile (profiling/hmc_mk2
) which produces this kind of output: profile.pdf
I think we can remove profiling/hmc since it will not work with the current log.
in the log there is no information on the gauges subfunctions, it look like:
# : Time for ndrat_derivative 1.016409e+01 s level: 1 proc_id: 0 /HMC/ndcloverrat1:ndrat_derivative
# : Time for gauge_derivative 2.770116e+00 s level: 1 proc_id: 0 /HMC/GAUGE:gauge_derivative
# : Time for update_gauge 4.501816e-01 s level: 1 proc_id: 0 /HMC/update_gauge
these with
# QUDA: QUDA 1.0.0 (git v0.9.0-4675-g16b06db2d-sm_80)
and tmLQCD
Version 5.2.0, commit 20e85af0400ad5f9d7da7aa150caf096b119add9
If you had UseExternalLibrary = quda
set in the GAUGE monomial, you should see something like:
# TM_QUDA: Time for reorder_gauge_toQuda 9.961213e-03 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/reorder_gauge_toQuda
# TM_QUDA: Time for computeGaugeForceQuda 1.246274e-01 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/computeGaugeForceQuda
# TM_QUDA: Time for reorder_mom_fromQuda 3.761868e-02 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/reorder_mom_fromQuda
# TM_QUDA: Time for add_mom_to_derivative 2.790029e-03 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/add_mom_to_derivative
# TM_QUDA: Time for compute_gauge_derivative_quda 1.016333e+00 s level: 2 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda
# : Time for gauge_derivative 1.016342e+00 s level: 1 proc_id: 0 /HMC/GAUGE:gauge_derivative
# : Time for update_gauge 5.516283e-02 s level: 1 proc_id: 0 /HMC/update_gauge
# TM_QUDA: Time for reorder_gauge_toQuda 8.106367e-03 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/reorder_gauge_toQuda
# TM_QUDA: Time for computeGaugeForceQuda 1.143318e-01 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/computeGaugeForceQuda
# TM_QUDA: Time for reorder_mom_fromQuda 3.010765e-02 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/reorder_mom_fromQuda
# TM_QUDA: Time for add_mom_to_derivative 2.823760e-03 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/add_mom_to_derivative
# TM_QUDA: Time for compute_gauge_derivative_quda 1.577096e-01 s level: 2 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda
# : Time for gauge_derivative 1.577177e-01 s level: 1 proc_id: 0 /HMC/GAUGE:gauge_derivative
# TM_QUDA: Time for reorder_gauge_toQuda 7.356934e-03 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/reorder_gauge_toQuda
# TM_QUDA: Time for computeGaugeForceQuda 1.130398e-01 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/computeGaugeForceQuda
# TM_QUDA: Time for reorder_mom_fromQuda 1.770647e-02 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/reorder_mom_fromQuda
# TM_QUDA: Time for add_mom_to_derivative 2.842720e-03 s level: 3 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda/add_mom_to_derivative
# TM_QUDA: Time for compute_gauge_derivative_quda 1.431690e-01 s level: 2 proc_id: 0 /HMC/GAUGE:gauge_derivative/compute_gauge_derivative_quda
# : Time for gauge_derivative 1.431770e-01 s level: 1 proc_id: 0 /HMC/GAUGE:gauge_derivative
and so on.
I think we can remove profiling/hmc since it will not work with the current log.
I wanted to keep the option available to perhaps make it work with the current output as it is in principle more flexible than my implementation.
If you had
UseExternalLibrary = quda
set in the GAUGE monomial, you should see something like:
Sorry I compile the version that support quda in the GAUGE monomial but I forgot to add it in the input. Should I rerun the benchmark?
I think we can remove profiling/hmc since it will not work with the current log.
I wanted to keep the option available to perhaps make it work with the current output as it is in principle more flexible than my implementation.
ok, I will do the necesssary modifications.
ok, I will do the necesssary modifications.
I was thinking more in terms of "let's keep them around if we perhaps need them". If you see an urgent need, please go ahead, but don't feel obliged to invest time into this now.
replaces #505