Closed hughcars closed 3 months ago
The above was tested utilizing -framework Accelerate on a Mac M1. Building from scratch using the armpl instead, this appears to resolve, the number of iterations for rings increases slightly with 6 threads, but does converge, and for the cpw_lumped converges in 23 exactly. Closing this as it appears the issue is related more related to incorrectly configuring the BLAS rather than Palace.
The rings example when run with
nt >= 2
fails in the first linear solve, diverging at 100 iterations with a KSP norm of O(1e1). For-nt 1
converges in 47 iterations. Running withMGMaxLevels
results in 75 KSP iterations, for-nt 1
and-nt 6
.The spheres example solves in 9 iterations for
-nt 6
and-nt 1
, suggesting the issue is restricted to nedelec elements.The cavity example for tet
-nt 1
takes 9 iterations,-nt 6
10 iterations, for hex 5 and 5. Suggests not always an issue, but the slight increase in number of iterations suggests it might just not be becoming an issue fast enough on this simpler problem.The cpw_wave_uniform with
-nt 1
takes 23 KSP iterations, with-nt 6
doesn't converge in 200 iterations. cpw_lumped_uniform doesn't converge with-nt 6
and multigrid, does converge without multigrid.This suggests the issue is:
R
andP
in the multigrid transfer process.