Open yousefmoazzam opened 3 months ago
Viewing the nsys profile reports in nsight reveals that the ~70s "wait" before writing some blocks to intermediate data occurs firstly after the first time the paganin filter method is executed (taking ~59s). It then occurs several times in the first block being processed in the section containing FBP (taking ~70s each time).
More specifically, the stripe removal method's output being written in that section doesn't cause the wait, but
all only on the first block iteration within that section), see the screenshot below
Regarding my comment about this possibly being related to MPI synchronisations occurring, this was due to seeing function calls in the region where the waiting/blocking occurs mentioning mutexes:
Another interesting function call that seems to appear early on in every instance where the waiting/blocking occurs mentions "memcpy" and "unaligned":
The name of the region being "vdso" also seems noteworthy. At first there was a suspicion it was referring to "VDS (virtual datasets)", a feature of hdf5. However, this feature in hdf5 isn't being used when writing data in httomo. Given that some of the functions in that region appear to be system calls, the latest speculation is that maybe it's referring to "vDSO" that is part of some binary executables https://www.man7.org/linux/man-pages/man7/vdso.7.html
How the benchmarking runs were organised
Various runs on the production SLURM cluster at DLS were performed using
and varying the parameters relevant to intermediate data saving
Hardware
The fixed hardware configuration was as follows:
Pipeline
The fixed pipeline was https://github.com/dkazanc/dls_pipelines/blob/main/pipelines/bench_pipeline_gpu_intense_separate_rescale.yaml.
Data
The fixed dataset was "the sandstone data"
119647.nxs
(20GB).Parameters investigated
The parameters that were varied were:
--save-all
flag or notTimes
The speeds of pipeline execution are ordered from fastest to slowest, grouped under the headings of whether the
save-all
flag was used or not.The full time taken for the pipeline to execute was taken from the logfile that httomo creates.
Without
--save-all
With
--save-all
Concluding remarks
--save-all
: 243s (chunked+uncompressed) vs. 353s (unchunked+uncompressed) = 110s speedup (243/353 * 100 = 68.8% of original time, so ~30% decrease in time taken)--save-all
: 794s (chunked+uncompressed) vs. 1148s (unchunked+uncompressed) = 354s speedup (794/1148 * 100 = 69.1% of original time, so ~30% decrease in time taken)