Open lukastruemper opened 2 years ago
@cod3monk do you have some time to discuss speed improvements / approximations?
Sorry for the delay. We have a meeting with cod3monk next week and come back to you afterwards.
Hi @lukastruemper,
sorry for the long delay and thanks for getting in touch with us.
The warmup-phase should be limited by invalid_entries > 0
(becoming 0 when the cache is fully initialized, indirectly taking the number of accessed elements into account). You can try decreasing warmup_increment
, but I wouldn't expect much improvement. A more likely target for improvements would be the subsequently called functions:
backend.Cache.loadstore
function andKernel.compile_global_offsets
https://github.com/RRZE-HPC/kerncraft/blob/b5a302d20669fe6c1d1ee08ebcc68457968d9257/kerncraft/kernel.py#L535@TomTheBear and @JanLJL will be in touch with you, on how to proceed.
All right, thanks for the detailed reply! I'll talk to them :)
Hello,
I am applying your nice tool to typical stencil applications and I am observing very long simulation runtimes on high-dimensional stencils (several orders of magnitude longer than execution time). Most of the time is spent in the "warmup phase" and I am wondering about this:
https://github.com/RRZE-HPC/kerncraft/blob/b5a302d20669fe6c1d1ee08ebcc68457968d9257/kerncraft/cacheprediction.py#L563
Does it assume that only one element is loaded/stored to the cache per iteration? On higher-dimensional stencils, I easily read 100-1000 elements per iteration.
So could something like this be used instead of element_size: https://github.com/RRZE-HPC/kerncraft/blob/b5a302d20669fe6c1d1ee08ebcc68457968d9257/kerncraft/cacheprediction.py#L548
, but estimated on read elements per iteration? If this leads to inaccuracy, would this still be reasonably accurate?
I would have researched this in the related publications, but I couldn't find those details.
Thanks in advance!