Closed oskeng closed 1 year ago
Going to close this as a dupe of #92, thanks for the benchmarks! As far as I know this is a problem with how threading works in Julia. There's been an effort (which is very much on the back burner because development on this project is done on free time, and without funding), to do a total refactor of the parallel processing code to use Dagger.jl. Hopefully that will be done eventually.
As the title suggests - the computation time decreases with the number of workers but plateaus at ~6-8 workers and increases thereafter. I have tested this both on my workstation (8 cores, 128 GB) and on a large memory node at a cluster (4x18 cores, 3072 GB).
Confirms #92.
Tests as follows with source_from_resistance = true and solver = cholmod (same behaviour with cg+amg).
@workstation - ETA measured at ~2% 1 worker: ETA 10:45 4 workers: ETA 5:30 6 workers: ETA 4:10 8 workers: ETA 4:20 10 workers: ETA 4:30
@cluster - ETA measured at ~2% 72 workers: ETA 10:05 36 workers: ETA 8:50 18 workers: ETA 8:30 8 workers: ETA 7:50 6 workers: ETA 7:25 4 workers: ETA 8:10
(I used different radius and block size on the workstation and the cluster, so disregard the relative differences. It is rather similar though as what it would look like with identical settings)
Besides being a strange behavior, this effectively means that there is no point using large memory clusters to solve sizable problems, unless the radius is so large so that the memory runs out at less then 6 workers on a workstation.