Open Kabbone opened 1 month ago
Hello @Kabbone,
This looks like a known issue. The residual value used to exit the gmres loop is not fully computed at each iteration and thus it seems like you converge but you do not. You can try compiling with -DPASTIX_DEBUG_GMRES=ON to compute the exact residual at each iteration. Playing with the gmres restart should help to converge if this is what I think. (IPARM_GMRES_IM)
Or could you please write the spm with a call to :
int
spmSave( const spmatrix_t *spm,
const char *filename )
and give us the file. I can't read the file you added in your post.
Hi @mfaverge, at first sorry for taking so long for coming back with CalculiX. Your mixed precision implementation seems to work fine. I attached the spm.txt saved with spmSave, before I used spmPrint to a file. spm.txt
It seems I can only control it with IPARM_ITERMAX
I guess you were right, that it's stuck slightly above 1e-11
************************************************************
CalculiX Version DEVELOPMENT i8, Copyright(C) 1998-2015 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm
************************************************************
You are using an executable made on Thu Aug 8 18:09:38 CEST 2024
Decascading the MPC's
Determining the structure of the matrix:
Using up to 1 cpu(s) for setting up the structure of the matrix.
number of equations
720
number of nonzero lower triangular matrix elements
37458
Using up to 1 cpu(s) for the stress calculation.
Using up to 1 cpu(s) for the symmetric stiffness/mass contributions.
Not reusing csc.
IPARM_MIXED=1
globDouble=0
+-------------------------------------------------+
+ PaStiX : Parallel Sparse matriX package +
+-------------------------------------------------+
Version: 6.4.0
Schedulers:
sequential: Enabled
thread static: Started
thread dynamic: Disabled
PaRSEC: Disabled
StarPU: Started
Number of MPI processes: 1
Number of threads per process: 1
Number of GPUs: 0
MPI communication support: PastixMpiNone
Distribution level: 2D( 128)
Blocking size (min/max): 1024 / 2048
Computational models
CPU: AMD Opteron 6180 - Intel MKL
GPU: Nvidia K40 GK1108L - CUDA 8.0
Low rank parameters:
Strategy No compression
Matrix type: General
Arithmetic: Double
Format: CSC
N: 720
nnz: 75636
+-------------------------------------------------+
Ordering subtask :
Ordering method is: Scotch
Time to compute ordering 6.287098e-04 s
+-------------------------------------------------+
Symbolic factorization subtask:
Symbol factorization using: Fax Direct
Number of nonzeroes in L structure 64548
Fill-in of L 0.853403
Time to compute symbol matrix 5.745888e-04 s
+-------------------------------------------------+
Reordering subtask:
Split level 0
Stoping criterion -1
Time for reordering 6.349087e-04 s
+-------------------------------------------------+
Mapping/Scheduling subtask:
Number of non-zeroes in blocked L 129096
Fill-in 1.706806
Number of operations in full-rank: LU 12.72 MFlops
Prediction:
Model AMD 6180 MKL
Time to factorize 6.389937e-03 s
Time for mapping/scheduling 3.242493e-05 s
+-------------------------------------------------+
Analyze task:
Total time for analyze 2.400160e-03 s
+-------------------------------------------------+
Factorization task:
Factorization used: LU
Time to initialize internal csc 3.986359e-03 s
Time to initialize coeftab 4.260540e-04 s
Time to factorize 5.285740e-03 s ( 2.35 GFlop/s)
Number of operations 4.80 MFlops
Number of static pivots 0
Memory usage of coeftab 641 Ko
Time to solve 7.019043e-04 s
- iteration 1 :
total iteration time 0.00066 s
error 1.5925e-05
error 1.5925e-05
- iteration 2 :
total iteration time 0.000588 s
error 7.6706e-09
error 7.6711e-09
- iteration 3 :
total iteration time 0.000604 s
error 5.7064e-12
error 1.6631e-11
- iteration 4 :
total iteration time 0.000594 s
error 2.5906e-15
error 1.7371e-11
- iteration 5 :
total iteration time 0.000612 s
error 1.679e-14
error 1.63e-11
- iteration 6 :
total iteration time 0.00061 s
error 3.956e-14
error 1.7264e-11
- iteration 7 :
total iteration time 0.000597 s
error 8.5403e-15
error 1.511e-11
- iteration 8 :
total iteration time 0.000582 s
error 1.1365e-14
error 1.5319e-11
- iteration 9 :
total iteration time 0.000598 s
error 1.2293e-14
error 1.635e-11
- iteration 10 :
total iteration time 0.000612 s
error 1.4439e-14
error 1.7826e-11
- iteration 11 :
total iteration time 0.000637 s
error 3.0981e-14
error 1.6971e-11
- iteration 12 :
total iteration time 0.000602 s
error 1.0435e-14
error 1.8759e-11
- iteration 13 :
total iteration time 0.000657 s
error 1.6411e-14
error 1.7072e-11
- iteration 14 :
total iteration time 0.000679 s
error 2.3476e-14
error 1.6487e-11
- iteration 15 :
total iteration time 0.000634 s
error 9.5529e-15
error 1.5693e-11
- iteration 16 :
total iteration time 0.000658 s
error 2.2433e-14
error 1.7212e-11
- iteration 17 :
total iteration time 0.000657 s
error 9.3223e-15
error 1.5952e-11
- iteration 18 :
total iteration time 0.0006 s
error 2.3023e-14
error 1.6696e-11
- iteration 19 :
total iteration time 0.000594 s
error 1.0315e-14
error 2.0247e-11
- iteration 20 :
total iteration time 0.000651 s
error 6.6836e-15
error 1.7416e-11
- iteration 21 :
total iteration time 0.000583 s
error 1.1995e-14
error 1.677e-11
- iteration 22 :
total iteration time 0.000604 s
error 5.5166e-14
error 1.7235e-11
- iteration 23 :
total iteration time 0.00062 s
error 6.8305e-14
error 1.5688e-11
- iteration 24 :
total iteration time 0.000622 s
error 5.8102e-14
error 1.778e-11
- iteration 25 :
total iteration time 0.000625 s
error 2.3023e-14
error 1.9619e-11
- iteration 26 :
total iteration time 0.000632 s
error 6.8905e-14
error 1.699e-11
- iteration 27 :
total iteration time 0.000641 s
error 8.2466e-14
error 1.7254e-11
- iteration 28 :
total iteration time 0.000628 s
error 1.7436e-14
error 1.6907e-11
- iteration 29 :
total iteration time 0.000611 s
error 8.9074e-15
error 1.6963e-11
- iteration 30 :
total iteration time 0.000638 s
error 2.3288e-14
error 1.67e-11
- iteration 31 :
total iteration time 0.000623 s
error 9.3604e-15
error 1.7825e-11
- iteration 32 :
total iteration time 0.000592 s
error 1.2271e-14
error 1.754e-11
- iteration 33 :
total iteration time 0.000599 s
error 1.2695e-14
error 1.432e-11
- iteration 34 :
total iteration time 0.000625 s
error 1.8129e-14
error 1.5334e-11
- iteration 35 :
total iteration time 0.000624 s
error 1.0223e-14
error 1.607e-11
- iteration 36 :
total iteration time 0.000588 s
error 2.1563e-14
error 1.9292e-11
- iteration 37 :
total iteration time 0.000622 s
error 6.1731e-15
error 1.7365e-11
- iteration 38 :
total iteration time 0.00062 s
error 3.8834e-14
error 1.7053e-11
- iteration 39 :
total iteration time 0.000679 s
error 6.2702e-14
error 1.5576e-11
- iteration 40 :
total iteration time 0.0006 s
error 3.3096e-14
error 1.7805e-11
- iteration 41 :
total iteration time 0.000605 s
error 2.4083e-15
error 1.5616e-11
- iteration 42 :
total iteration time 0.000609 s
error 1.4349e-14
error 1.5907e-11
- iteration 43 :
total iteration time 0.000607 s
error 5.6266e-14
error 1.9332e-11
- iteration 44 :
total iteration time 0.000619 s
error 8.6422e-15
error 1.655e-11
- iteration 45 :
total iteration time 0.000609 s
error 1.5109e-14
error 1.6133e-11
- iteration 46 :
total iteration time 0.000613 s
error 1.4948e-14
error 1.6641e-11
- iteration 47 :
total iteration time 0.000623 s
error 1.8559e-14
error 1.7037e-11
- iteration 48 :
total iteration time 0.000608 s
error 4.0336e-14
error 1.6813e-11
- iteration 49 :
total iteration time 0.000598 s
error 3.5169e-14
error 1.6692e-11
- iteration 50 :
total iteration time 0.000623 s
error 4.2722e-14
error 1.4147e-11
Time for refinement 3.564286e-02 s
________________________________________
CSC Conversion Time: 0.001122
Init Time: 0.014379
Factorize Time: 0.009816
Solve Time: 0.036381
Clean up Time: 0.000000
---------------------------------
Sum: 0.061698
Total PaStiX Time: 0.061698
CCX without PaStiX Time: 0.012325
Share of PaStiX Time: 0.833494
Total Time: 0.074024
Reusability: 0 : 1
________________________________________
Using up to 1 cpu(s) for the stress calculation.
The numbers below are estimated upper bounds
number of:
nodes: 261
elements: 32
one-dimensional elements: 0
two-dimensional elements: 0
integration points per element: 8
degrees of freedom per node: 3
layers per element: 1
distributed facial loads: 0
distributed volumetric loads: 0
concentrated loads: 9
single point constraints: 63
multiple point constraints: 1
terms in all multiple point constraints: 1
tie constraints: 0
dependent nodes tied by cyclic constraints: 0
dependent nodes in pre-tension constraints: 0
sets: 6
terms in all sets: 105
materials: 1
constants per material and temperature: 2
temperature points per material: 1
plastic data points per material: 0
orientations: 0
amplitudes: 4
data points in all amplitudes: 4
print requests: 4
transformations: 0
property cards: 0
STEP 1
Static analysis was selected
Job finished
________________________________________
Total CalculiX Time: 0.075464
________________________________________
Ok, for this it's difficult to find a generic solution as it is problem dependent. I was checking your log and you don't have static pivoting so you can't play on this either. Thus the only other solution is to reduce a bit the epsilon required.
Thanks for feedback on the mixed precision. Unfortunately we did not have time to integrate the GPU matrix vector you have, but I don't remember if we discussed it or if you noticed, but we also added the global allocation of the matrix if needed.
The GPU matrix vector was just a little gimmick, as far as I remember it only made a small difference. It noticed the global allocation in the docs, but need to look into it a bit more what this exactly means now. So far I also was only able to do very minor tests, need to check some bigger cases at work.
@mfaverge The GMRES refinement doesn't stop in this specific case, even it already hit the threshold. It seems like
EPSILON_REFINEMENT=1e-12
EPSILON_REFINEMENT=1e-10
The according SPM: spm.txt