Kabbone / CalculiX

This repository contains the source files of CalculiX, a three-dimensional Finite Element Program (www.calculix.de).
GNU General Public License v2.0
1 stars 0 forks source link

PaStiX Refine convergence doesn't stop at EPSILON_REFINEMENT (1e-12) in some cases #1

Open Kabbone opened 1 month ago

Kabbone commented 1 month ago

@mfaverge The GMRES refinement doesn't stop in this specific case, even it already hit the threshold. It seems like

EPSILON_REFINEMENT=1e-12


************************************************************

CalculiX Version DEVELOPMENT i8, Copyright(C) 1998-2015 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Thu Aug  8 18:09:38 CEST 2024
 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 1 cpu(s) for setting up the structure of the matrix.
 number of equations
 720
 number of nonzero lower triangular matrix elements
 37458

 Using up to 1 cpu(s) for the stress calculation.

 Using up to 1 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.
IPARM_MIXED=1
globDouble=0
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.4.0
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                               Disabled
    StarPU:                                Started
  Number of MPI processes:                       1
  Number of threads per process:                 1
  Number of GPUs:                                0
  MPI communication support:              PastixMpiNone
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048
  Computational models
    CPU:              AMD Opteron 6180 - Intel MKL
    GPU:             Nvidia K40 GK1108L - CUDA 8.0
  Low rank parameters:
    Strategy                        No compression

  Matrix type:  General
  Arithmetic:   Double
  Format:       CSC
  N:            720
  nnz:          75636

+-------------------------------------------------+
  Ordering subtask :
    Ordering method is: Scotch
    Time to compute ordering              6.487370e-04 s
+-------------------------------------------------+
  Symbolic factorization subtask:
    Symbol factorization using: Fax Direct
    Number of nonzeroes in L structure       64548
    Fill-in of L                          0.853403
    Time to compute symbol matrix         5.757809e-04 s
+-------------------------------------------------+
  Reordering subtask:
    Split level                                  0
    Stoping criterion                           -1
    Time for reordering                   6.735325e-04 s
+-------------------------------------------------+
  Mapping/Scheduling subtask:
    Number of non-zeroes in blocked L       129096
    Fill-in                               1.706806
    Number of operations in full-rank: LU       12.72 MFlops
    Prediction:
      Model                              AMD 6180  MKL
      Time to factorize                   6.389937e-03 s
    Time for mapping/scheduling           3.147125e-05 s
+-------------------------------------------------+
  Analyze task:
    Total time for analyze                2.520323e-03 s
+-------------------------------------------------+
  Factorization task:
    Factorization used: LU
    Time to initialize internal csc       4.385948e-03 s
    Time to initialize coeftab            4.205704e-04 s
    Time to factorize                     5.151033e-03 s ( 2.41 GFlop/s)
    Number of operations                        4.80 MFlops
    Number of static pivots                      0
    Memory usage of coeftab                        641 Ko
    Time to solve                         6.747246e-04 s
    - iteration 1 :
         total iteration time                   0.000623 s
         error                                  1.5925e-05
    - iteration 2 :
         total iteration time                   0.000602 s
         error                                  7.6706e-09
    - iteration 3 :
         total iteration time                   0.000581 s
         error                                  5.7064e-12
    - iteration 4 :
         total iteration time                   0.000588 s
         error                                  2.5906e-15
    - iteration 5 :
         total iteration time                   0.000583 s
         error                                  1.679e-14
    - iteration 6 :
         total iteration time                   0.000606 s
         error                                  3.956e-14
    - iteration 7 :
         total iteration time                   0.000594 s
         error                                  8.5403e-15
    - iteration 8 :
         total iteration time                   0.000612 s
         error                                  1.1365e-14
    - iteration 9 :
         total iteration time                   0.000634 s
         error                                  1.2293e-14
    - iteration 10 :
         total iteration time                   0.000645 s
         error                                  1.4439e-14
    - iteration 11 :
         total iteration time                   0.000615 s
         error                                  3.0981e-14
    - iteration 12 :
         total iteration time                   0.000578 s
         error                                  1.0435e-14
    - iteration 13 :
         total iteration time                   0.000594 s
         error                                  1.6411e-14
    - iteration 14 :
         total iteration time                   0.000664 s
         error                                  2.3476e-14
    - iteration 15 :
         total iteration time                   0.000621 s
         error                                  9.5529e-15
    - iteration 16 :
         total iteration time                   0.000632 s
         error                                  2.2433e-14
    - iteration 17 :
         total iteration time                   0.000588 s
         error                                  9.3223e-15
    - iteration 18 :
         total iteration time                   0.000629 s
         error                                  2.3023e-14
    - iteration 19 :
         total iteration time                   0.000623 s
         error                                  1.0315e-14
    - iteration 20 :
         total iteration time                   0.000608 s
         error                                  6.6836e-15
    - iteration 21 :
         total iteration time                   0.000577 s
         error                                  1.1995e-14
    - iteration 22 :
         total iteration time                   0.000605 s
         error                                  5.5166e-14
    - iteration 23 :
         total iteration time                   0.000645 s
         error                                  6.8305e-14
    - iteration 24 :
         total iteration time                   0.000574 s
         error                                  5.8102e-14
    - iteration 25 :
         total iteration time                   0.000599 s
         error                                  2.3023e-14
    - iteration 26 :
         total iteration time                   0.000587 s
         error                                  6.8905e-14
    - iteration 27 :
         total iteration time                   0.000618 s
         error                                  8.2466e-14
    - iteration 28 :
         total iteration time                   0.000603 s
         error                                  1.7436e-14
    - iteration 29 :
         total iteration time                   0.000622 s
         error                                  8.9074e-15
    - iteration 30 :
         total iteration time                   0.000617 s
         error                                  2.3288e-14
    - iteration 31 :
         total iteration time                   0.000598 s
         error                                  9.3604e-15
    - iteration 32 :
         total iteration time                   0.000749 s
         error                                  1.2271e-14
    - iteration 33 :
         total iteration time                   0.000695 s
         error                                  1.2695e-14
    - iteration 34 :
         total iteration time                   0.000669 s
         error                                  1.8129e-14
    - iteration 35 :
         total iteration time                   0.000654 s
         error                                  1.0223e-14
    - iteration 36 :
         total iteration time                   0.00062 s
         error                                  2.1563e-14
    - iteration 37 :
         total iteration time                   0.000613 s
         error                                  6.1731e-15
    - iteration 38 :
         total iteration time                   0.000589 s
         error                                  3.8834e-14
    - iteration 39 :
         total iteration time                   0.00061 s
         error                                  6.2702e-14
    - iteration 40 :
         total iteration time                   0.000637 s
         error                                  3.3096e-14
    - iteration 41 :
         total iteration time                   0.000569 s
         error                                  2.4083e-15
    - iteration 42 :
         total iteration time                   0.000608 s
         error                                  1.4349e-14
    - iteration 43 :
         total iteration time                   0.000611 s
         error                                  5.6266e-14
    - iteration 44 :
         total iteration time                   0.000601 s
         error                                  8.6422e-15
    - iteration 45 :
         total iteration time                   0.000602 s
         error                                  1.5109e-14
    - iteration 46 :
         total iteration time                   0.000612 s
         error                                  1.4948e-14
    - iteration 47 :
         total iteration time                   0.000625 s
         error                                  1.8559e-14
    - iteration 48 :
         total iteration time                   0.000627 s
         error                                  4.0336e-14
    - iteration 49 :
         total iteration time                   0.000639 s
         error                                  3.5169e-14
    - iteration 50 :
         total iteration time                   0.000634 s
         error                                  4.2722e-14
    - iteration 51 :
         total iteration time                   0.000612 s
         error                                  5.0145e-14
    - iteration 52 :
         total iteration time                   0.000623 s
         error                                  4.8161e-14
    - iteration 53 :
         total iteration time                   0.000639 s
         error                                  1.0881e-13
    - iteration 54 :
         total iteration time                   0.000598 s
         error                                  5.9627e-14
    - iteration 55 :
         total iteration time                   0.000628 s
         error                                  6.0417e-14
    - iteration 56 :
         total iteration time                   0.000607 s
         error                                  4.4262e-14
    - iteration 57 :
         total iteration time                   0.000626 s
         error                                  1.4495e-14
    - iteration 58 :
         total iteration time                   0.00059 s
         error                                  1.622e-14
    - iteration 59 :
         total iteration time                   0.000609 s
         error                                  1.325e-14
    - iteration 60 :
         total iteration time                   0.000643 s
         error                                  2.9311e-14
    - iteration 61 :
         total iteration time                   0.000623 s
         error                                  1.4657e-14
    - iteration 62 :
         total iteration time                   0.000628 s
         error                                  2.0651e-14
    - iteration 63 :
         total iteration time                   0.000614 s
         error                                  2.7552e-15
    - iteration 64 :
         total iteration time                   0.000645 s
         error                                  1.6454e-14
    - iteration 65 :
         total iteration time                   0.000619 s
         error                                  1.6926e-14
    - iteration 66 :
         total iteration time                   0.000711 s
         error                                  3.7045e-14
    - iteration 67 :
         total iteration time                   0.00067 s
         error                                  1.9121e-14
    - iteration 68 :
         total iteration time                   0.0006 s
         error                                  3.5775e-14
    - iteration 69 :
         total iteration time                   0.000601 s
         error                                  3.0038e-14
    - iteration 70 :
         total iteration time                   0.000587 s
         error                                  2.915e-14
    Time for refinement                   4.652548e-02 s
________________________________________

CSC Conversion Time: 0.030951
Init Time: 0.020268
Factorize Time: 0.010060
Solve Time: 0.047231
Clean up Time: 0.000000
---------------------------------
Sum: 0.108510

Total PaStiX Time: 0.108510
CCX without PaStiX Time: 0.012585
Share of PaStiX Time: 0.896070
Total Time: 0.121096
Reusability: 0 : 1 
________________________________________

 Using up to 1 cpu(s) for the stress calculation.

  The numbers below are estimated upper bounds

  number of:

   nodes:                   261
   elements:                    32
   one-dimensional elements:                     0
   two-dimensional elements:                     0
   integration points per element:                     8
   degrees of freedom per node:                     3
   layers per element:                     1

   distributed facial loads:                     0
   distributed volumetric loads:                     0
   concentrated loads:                     9
   single point constraints:                    63
   multiple point constraints:                     1
   terms in all multiple point constraints:                     1
   tie constraints:                     0
   dependent nodes tied by cyclic constraints:                     0
   dependent nodes in pre-tension constraints:                     0

   sets:                     6
   terms in all sets:                   105

   materials:                     1
   constants per material and temperature:                     2
   temperature points per material:                     1
   plastic data points per material:                     0

   orientations:                     0
   amplitudes:                     4
   data points in all amplitudes:                     4
   print requests:                     4
   transformations:                     0
   property cards:                     0

 STEP                     1

 Static analysis was selected

 Job finished

________________________________________

Total CalculiX Time: 0.122414
________________________________________

EPSILON_REFINEMENT=1e-10

************************************************************

CalculiX Version DEVELOPMENT i8, Copyright(C) 1998-2015 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Thu Aug  8 18:09:38 CEST 2024
 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 1 cpu(s) for setting up the structure of the matrix.
 number of equations
 720
 number of nonzero lower triangular matrix elements
 37458

 Using up to 1 cpu(s) for the stress calculation.

 Using up to 1 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.
IPARM_MIXED=1
globDouble=0
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.4.0
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                               Disabled
    StarPU:                                Started
  Number of MPI processes:                       1
  Number of threads per process:                 1
  Number of GPUs:                                0
  MPI communication support:              PastixMpiNone
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048
  Computational models
    CPU:              AMD Opteron 6180 - Intel MKL
    GPU:             Nvidia K40 GK1108L - CUDA 8.0
  Low rank parameters:
    Strategy                        No compression

  Matrix type:  General
  Arithmetic:   Double
  Format:       CSC
  N:            720
  nnz:          75636

+-------------------------------------------------+
  Ordering subtask :
    Ordering method is: Scotch
    Time to compute ordering              8.037090e-04 s
+-------------------------------------------------+
  Symbolic factorization subtask:
    Symbol factorization using: Fax Direct
    Number of nonzeroes in L structure       64548
    Fill-in of L                          0.853403
    Time to compute symbol matrix         5.705357e-04 s
+-------------------------------------------------+
  Reordering subtask:
    Split level                                  0
    Stoping criterion                           -1
    Time for reordering                   6.611347e-04 s
+-------------------------------------------------+
  Mapping/Scheduling subtask:
    Number of non-zeroes in blocked L       129096
    Fill-in                               1.706806
    Number of operations in full-rank: LU       12.72 MFlops
    Prediction:
      Model                              AMD 6180  MKL
      Time to factorize                   6.389937e-03 s
    Time for mapping/scheduling           3.266335e-05 s
+-------------------------------------------------+
  Analyze task:
    Total time for analyze                2.646446e-03 s
+-------------------------------------------------+
  Factorization task:
    Factorization used: LU
    Time to initialize internal csc       3.980398e-03 s
    Time to initialize coeftab            4.131794e-04 s
    Time to factorize                     5.130768e-03 s ( 2.42 GFlop/s)
    Number of operations                        4.80 MFlops
    Number of static pivots                      0
    Memory usage of coeftab                        641 Ko
    Time to solve                         8.964539e-04 s
    - iteration 1 :
         total iteration time                   0.000604 s
         error                                  1.5925e-05
    - iteration 2 :
         total iteration time                   0.000607 s
         error                                  7.6706e-09
    - iteration 3 :
         total iteration time                   0.00061 s
         error                                  5.7064e-12
    Time for refinement                   1.960993e-03 s
________________________________________

CSC Conversion Time: 0.001365
Init Time: 0.014394
Factorize Time: 0.009632
Solve Time: 0.002890
Clean up Time: 0.000000
---------------------------------
Sum: 0.028281

Total PaStiX Time: 0.028281
CCX without PaStiX Time: 0.012839
Share of PaStiX Time: 0.687770
Total Time: 0.041121
Reusability: 0 : 1 
________________________________________

 Using up to 1 cpu(s) for the stress calculation.

  The numbers below are estimated upper bounds

  number of:

   nodes:                   261
   elements:                    32
   one-dimensional elements:                     0
   two-dimensional elements:                     0
   integration points per element:                     8
   degrees of freedom per node:                     3
   layers per element:                     1

   distributed facial loads:                     0
   distributed volumetric loads:                     0
   concentrated loads:                     9
   single point constraints:                    63
   multiple point constraints:                     1
   terms in all multiple point constraints:                     1
   tie constraints:                     0
   dependent nodes tied by cyclic constraints:                     0
   dependent nodes in pre-tension constraints:                     0

   sets:                     6
   terms in all sets:                   105

   materials:                     1
   constants per material and temperature:                     2
   temperature points per material:                     1
   plastic data points per material:                     0

   orientations:                     0
   amplitudes:                     4
   data points in all amplitudes:                     4
   print requests:                     4
   transformations:                     0
   property cards:                     0

 STEP                     1

 Static analysis was selected

 Job finished

________________________________________

Total CalculiX Time: 0.042440
________________________________________

The according SPM: spm.txt

mfaverge commented 1 month ago

Hello @Kabbone,

This looks like a known issue. The residual value used to exit the gmres loop is not fully computed at each iteration and thus it seems like you converge but you do not. You can try compiling with -DPASTIX_DEBUG_GMRES=ON to compute the exact residual at each iteration. Playing with the gmres restart should help to converge if this is what I think. (IPARM_GMRES_IM)

Or could you please write the spm with a call to :

int
spmSave( const spmatrix_t *spm,
         const char       *filename )

and give us the file. I can't read the file you added in your post.

Kabbone commented 1 month ago

Hi @mfaverge, at first sorry for taking so long for coming back with CalculiX. Your mixed precision implementation seems to work fine. I attached the spm.txt saved with spmSave, before I used spmPrint to a file. spm.txt

It seems I can only control it with IPARM_ITERMAX

I guess you were right, that it's stuck slightly above 1e-11


************************************************************

CalculiX Version DEVELOPMENT i8, Copyright(C) 1998-2015 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Thu Aug  8 18:09:38 CEST 2024
 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 1 cpu(s) for setting up the structure of the matrix.
 number of equations
 720
 number of nonzero lower triangular matrix elements
 37458

 Using up to 1 cpu(s) for the stress calculation.

 Using up to 1 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.
IPARM_MIXED=1
globDouble=0
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.4.0
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                               Disabled
    StarPU:                                Started
  Number of MPI processes:                       1
  Number of threads per process:                 1
  Number of GPUs:                                0
  MPI communication support:              PastixMpiNone
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048
  Computational models
    CPU:              AMD Opteron 6180 - Intel MKL
    GPU:             Nvidia K40 GK1108L - CUDA 8.0
  Low rank parameters:
    Strategy                        No compression

  Matrix type:  General
  Arithmetic:   Double
  Format:       CSC
  N:            720
  nnz:          75636

+-------------------------------------------------+
  Ordering subtask :
    Ordering method is: Scotch
    Time to compute ordering              6.287098e-04 s
+-------------------------------------------------+
  Symbolic factorization subtask:
    Symbol factorization using: Fax Direct
    Number of nonzeroes in L structure       64548
    Fill-in of L                          0.853403
    Time to compute symbol matrix         5.745888e-04 s
+-------------------------------------------------+
  Reordering subtask:
    Split level                                  0
    Stoping criterion                           -1
    Time for reordering                   6.349087e-04 s
+-------------------------------------------------+
  Mapping/Scheduling subtask:
    Number of non-zeroes in blocked L       129096
    Fill-in                               1.706806
    Number of operations in full-rank: LU       12.72 MFlops
    Prediction:
      Model                              AMD 6180  MKL
      Time to factorize                   6.389937e-03 s
    Time for mapping/scheduling           3.242493e-05 s
+-------------------------------------------------+
  Analyze task:
    Total time for analyze                2.400160e-03 s
+-------------------------------------------------+
  Factorization task:
    Factorization used: LU
    Time to initialize internal csc       3.986359e-03 s
    Time to initialize coeftab            4.260540e-04 s
    Time to factorize                     5.285740e-03 s ( 2.35 GFlop/s)
    Number of operations                        4.80 MFlops
    Number of static pivots                      0
    Memory usage of coeftab                        641 Ko
    Time to solve                         7.019043e-04 s
    - iteration 1 :
         total iteration time                   0.00066 s
         error                                  1.5925e-05
         error                                  1.5925e-05
    - iteration 2 :
         total iteration time                   0.000588 s
         error                                  7.6706e-09
         error                                  7.6711e-09
    - iteration 3 :
         total iteration time                   0.000604 s
         error                                  5.7064e-12
         error                                  1.6631e-11
    - iteration 4 :
         total iteration time                   0.000594 s
         error                                  2.5906e-15
         error                                  1.7371e-11
    - iteration 5 :
         total iteration time                   0.000612 s
         error                                  1.679e-14
         error                                  1.63e-11
    - iteration 6 :
         total iteration time                   0.00061 s
         error                                  3.956e-14
         error                                  1.7264e-11
    - iteration 7 :
         total iteration time                   0.000597 s
         error                                  8.5403e-15
         error                                  1.511e-11
    - iteration 8 :
         total iteration time                   0.000582 s
         error                                  1.1365e-14
         error                                  1.5319e-11
    - iteration 9 :
         total iteration time                   0.000598 s
         error                                  1.2293e-14
         error                                  1.635e-11
    - iteration 10 :
         total iteration time                   0.000612 s
         error                                  1.4439e-14
         error                                  1.7826e-11
    - iteration 11 :
         total iteration time                   0.000637 s
         error                                  3.0981e-14
         error                                  1.6971e-11
    - iteration 12 :
         total iteration time                   0.000602 s
         error                                  1.0435e-14
         error                                  1.8759e-11
    - iteration 13 :
         total iteration time                   0.000657 s
         error                                  1.6411e-14
         error                                  1.7072e-11
    - iteration 14 :
         total iteration time                   0.000679 s
         error                                  2.3476e-14
         error                                  1.6487e-11
    - iteration 15 :
         total iteration time                   0.000634 s
         error                                  9.5529e-15
         error                                  1.5693e-11
    - iteration 16 :
         total iteration time                   0.000658 s
         error                                  2.2433e-14
         error                                  1.7212e-11
    - iteration 17 :
         total iteration time                   0.000657 s
         error                                  9.3223e-15
         error                                  1.5952e-11
    - iteration 18 :
         total iteration time                   0.0006 s
         error                                  2.3023e-14
         error                                  1.6696e-11
    - iteration 19 :
         total iteration time                   0.000594 s
         error                                  1.0315e-14
         error                                  2.0247e-11
    - iteration 20 :
         total iteration time                   0.000651 s
         error                                  6.6836e-15
         error                                  1.7416e-11
    - iteration 21 :
         total iteration time                   0.000583 s
         error                                  1.1995e-14
         error                                  1.677e-11
    - iteration 22 :
         total iteration time                   0.000604 s
         error                                  5.5166e-14
         error                                  1.7235e-11
    - iteration 23 :
         total iteration time                   0.00062 s
         error                                  6.8305e-14
         error                                  1.5688e-11
    - iteration 24 :
         total iteration time                   0.000622 s
         error                                  5.8102e-14
         error                                  1.778e-11
    - iteration 25 :
         total iteration time                   0.000625 s
         error                                  2.3023e-14
         error                                  1.9619e-11
    - iteration 26 :
         total iteration time                   0.000632 s
         error                                  6.8905e-14
         error                                  1.699e-11
    - iteration 27 :
         total iteration time                   0.000641 s
         error                                  8.2466e-14
         error                                  1.7254e-11
    - iteration 28 :
         total iteration time                   0.000628 s
         error                                  1.7436e-14
         error                                  1.6907e-11
    - iteration 29 :
         total iteration time                   0.000611 s
         error                                  8.9074e-15
         error                                  1.6963e-11
    - iteration 30 :
         total iteration time                   0.000638 s
         error                                  2.3288e-14
         error                                  1.67e-11
    - iteration 31 :
         total iteration time                   0.000623 s
         error                                  9.3604e-15
         error                                  1.7825e-11
    - iteration 32 :
         total iteration time                   0.000592 s
         error                                  1.2271e-14
         error                                  1.754e-11
    - iteration 33 :
         total iteration time                   0.000599 s
         error                                  1.2695e-14
         error                                  1.432e-11
    - iteration 34 :
         total iteration time                   0.000625 s
         error                                  1.8129e-14
         error                                  1.5334e-11
    - iteration 35 :
         total iteration time                   0.000624 s
         error                                  1.0223e-14
         error                                  1.607e-11
    - iteration 36 :
         total iteration time                   0.000588 s
         error                                  2.1563e-14
         error                                  1.9292e-11
    - iteration 37 :
         total iteration time                   0.000622 s
         error                                  6.1731e-15
         error                                  1.7365e-11
    - iteration 38 :
         total iteration time                   0.00062 s
         error                                  3.8834e-14
         error                                  1.7053e-11
    - iteration 39 :
         total iteration time                   0.000679 s
         error                                  6.2702e-14
         error                                  1.5576e-11
    - iteration 40 :
         total iteration time                   0.0006 s
         error                                  3.3096e-14
         error                                  1.7805e-11
    - iteration 41 :
         total iteration time                   0.000605 s
         error                                  2.4083e-15
         error                                  1.5616e-11
    - iteration 42 :
         total iteration time                   0.000609 s
         error                                  1.4349e-14
         error                                  1.5907e-11
    - iteration 43 :
         total iteration time                   0.000607 s
         error                                  5.6266e-14
         error                                  1.9332e-11
    - iteration 44 :
         total iteration time                   0.000619 s
         error                                  8.6422e-15
         error                                  1.655e-11
    - iteration 45 :
         total iteration time                   0.000609 s
         error                                  1.5109e-14
         error                                  1.6133e-11
    - iteration 46 :
         total iteration time                   0.000613 s
         error                                  1.4948e-14
         error                                  1.6641e-11
    - iteration 47 :
         total iteration time                   0.000623 s
         error                                  1.8559e-14
         error                                  1.7037e-11
    - iteration 48 :
         total iteration time                   0.000608 s
         error                                  4.0336e-14
         error                                  1.6813e-11
    - iteration 49 :
         total iteration time                   0.000598 s
         error                                  3.5169e-14
         error                                  1.6692e-11
    - iteration 50 :
         total iteration time                   0.000623 s
         error                                  4.2722e-14
         error                                  1.4147e-11
    Time for refinement                   3.564286e-02 s
________________________________________

CSC Conversion Time: 0.001122
Init Time: 0.014379
Factorize Time: 0.009816
Solve Time: 0.036381
Clean up Time: 0.000000
---------------------------------
Sum: 0.061698

Total PaStiX Time: 0.061698
CCX without PaStiX Time: 0.012325
Share of PaStiX Time: 0.833494
Total Time: 0.074024
Reusability: 0 : 1 
________________________________________

 Using up to 1 cpu(s) for the stress calculation.

  The numbers below are estimated upper bounds

  number of:

   nodes:                   261
   elements:                    32
   one-dimensional elements:                     0
   two-dimensional elements:                     0
   integration points per element:                     8
   degrees of freedom per node:                     3
   layers per element:                     1

   distributed facial loads:                     0
   distributed volumetric loads:                     0
   concentrated loads:                     9
   single point constraints:                    63
   multiple point constraints:                     1
   terms in all multiple point constraints:                     1
   tie constraints:                     0
   dependent nodes tied by cyclic constraints:                     0
   dependent nodes in pre-tension constraints:                     0

   sets:                     6
   terms in all sets:                   105

   materials:                     1
   constants per material and temperature:                     2
   temperature points per material:                     1
   plastic data points per material:                     0

   orientations:                     0
   amplitudes:                     4
   data points in all amplitudes:                     4
   print requests:                     4
   transformations:                     0
   property cards:                     0

 STEP                     1

 Static analysis was selected

 Job finished

________________________________________

Total CalculiX Time: 0.075464
________________________________________
mfaverge commented 1 month ago

Ok, for this it's difficult to find a generic solution as it is problem dependent. I was checking your log and you don't have static pivoting so you can't play on this either. Thus the only other solution is to reduce a bit the epsilon required.

Thanks for feedback on the mixed precision. Unfortunately we did not have time to integrate the GPU matrix vector you have, but I don't remember if we discussed it or if you noticed, but we also added the global allocation of the matrix if needed.

Kabbone commented 1 month ago

The GPU matrix vector was just a little gimmick, as far as I remember it only made a small difference. It noticed the global allocation in the docs, but need to look into it a bit more what this exactly means now. So far I also was only able to do very minor tests, need to check some bigger cases at work.