Dhondtguido / PaStiX4CalculiX

Other
16 stars 7 forks source link

any chance the changes are applied to the pastix-repo #21

Open looooo opened 1 month ago

looooo commented 1 month ago

I am wondering if these changes in this repo are mandatory for the use of pastix in calculix. If the answer is yes, is there any chance to apply these changes to the pastix repo?

looooo commented 1 month ago

also I am running into errors like this, trying to compile for windows:

CMake Error at CMakeLists.txt:833 (add_library):
  Syntax error in cmake code when parsing string

    common\d_integer.c

  Invalid character escape '\d'.

Does anyone know a solution for this?

Kabbone commented 1 month ago

so far, yes these changes are necessary how it's implemented. When we started there was no possibility to keep the matrix in memory and reuse the reordering permutation of the previous steps. There was some effort started of pastix upstream to make it possible, need to look into it if it's available now. Ideally we switch over to the vanilla pastix because we are already on a quite old version 6.0.2

Kabbone commented 1 month ago

also I am running into errors like this, trying to compile for windows:

CMake Error at CMakeLists.txt:833 (add_library):
  Syntax error in cmake code when parsing string

    common\d_integer.c

  Invalid character escape '\d'.

Does anyone know a solution for this?

I don't really use windows and so can't really help there, but it looks like it comes from the path how Windows uses it common\d_integer.c You could search for the user Rafal either here in the issues or in the discourse, I think he got it running on Windows with mingw

looooo commented 1 month ago

so far, yes these changes are necessary how it's implemented. When we started there was no possibility to keep the matrix in memory and reuse the reordering permutation of the previous steps. There was some effort started of pastix upstream to make it possible, need to look into it if it's available now. Ideally we switch over to the vanilla pastix because we are already on a quite old version 6.0.2

thanks, I am currently trying to package pastix for conda-forge [1]. I ended up with this task because I wanted to perform a frequency analysis of beam with no constraints. With spooles (I guess this is the solver used currently by my calculix) the results were not satisfying [2]. The FreeCAD community pointed out that pastix has better results. So this got me into packaging pastix. As I guess not only calculix will use pastix as a dependency it's important to use the inria repo as source. But on the other hand I would like to have good calculix support. So I think I will try to use the latest release (6.4.0) and make the package. Once this is done I will try to see what is missing for calculix support. Maybe you can help me by pointing out which patches are necessary for the calculix support.

Windows is also not on my high-priority-list but conda-forge is cross-plattform so it would be nice to have a package for windows too. Maybe @3rav can help with the encountered error.

[1] https://github.com/conda-forge/staged-recipes/pull/27062 [2] https://forum.freecad.org/viewtopic.php?p=772894&hilit=free+vibrating#p772894

looooo commented 1 month ago

so far, yes these changes are necessary how it's implemented. When we started there was no possibility to keep the matrix in memory and reuse the reordering permutation of the previous steps. There was some effort started of pastix upstream to make it possible, need to look into it if it's available now. Ideally we switch over to the vanilla pastix because we are already on a quite old version 6.0.2

Yes would be nice to rebase this branch on the latest pastix release (6.4.0). I tried to use the pastix6.4.0 with calculix, but failed to run a .inp file.

looooo commented 1 month ago

I was able to link pastix, but now pastix stops at:

Not reusing csc.
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.4.0
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                               Disabled
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 1
  Number of GPUs:                                0
  MPI communication support:              PastixMpiNone
  Distribution level:                     2D( 256)
  Blocking size (min/max):             1024 / 2048
  Computational models
    CPU:              AMD Opteron 6180 - Intel MKL
    GPU:             Nvidia K40 GK1108L - CUDA 8.0
  Low rank parameters:
    Strategy                        No compression

  Matrix type:  General
  Arithmetic:   Double
  Format:       CSC
  N:            55818
  nnz:          4211028

+-------------------------------------------------+
  Ordering subtask :

this is the repo I use https://github.com/looooo/CalculiX

pixi is used to install all dependencies. To build the library with pixi:

pixi run build

But I have tested only on a osx system.

Kabbone commented 1 month ago

There will be much more changes needed than the pastix Memalloc as far as I remember, because the spm handling is important with reuse mechanism how we implemented it back then. I'm on holiday right now, so I can't really check much and forgot a lot of what we did. Another approach could be to use the Intel Pardiso Solver from oneMKL. We also use that productive and actually the convergence is better than with (the old version of) Pastix, even though the speed with pastix was better and you get GPU accleration for free.

3rav commented 1 month ago

CalculiX label from PaStiX 6.3.0 https://gitlab.inria.fr/pastix/pastix/-/merge_requests?scope=all&state=merged&label_name[]=CalculiX image

and patch file (to make it easier to assess what has been introduced): 348_Single allocation coeftab.patch 342_Mixed precision.patch

looooo commented 1 month ago

Thanks for the information. pardiso is not really an option for me. I don't have a license for this solver.

I am maintainer of calculix and pastix for the conda-package-manager, so it's not a problem to add patches needed for calculix on top of pastix or spm. But I don't know what is needed for pastix to run with calculix. Maybe you can point me to the necessary commits/patches and we try to realign them on the current pastix/master.

Kabbone commented 1 month ago

CalculiX label from PaStiX 6.3.0 https://gitlab.inria.fr/pastix/pastix/-/merge_requests?scope=all&state=merged&label_name[]=CalculiX image

and patch file (to make it easier to assess what has been introduced): 348_Single allocation coeftab.patch 342_Mixed precision.patch

Thanks for the hint. I knew they were working on something, but it was not accessible in the public repo back then. Need to look into it again.

@looooo I think I would make more sense to try to migrate as much to mainline pastix as possible, but I don't know out of my head how much or if we still need customization. The pastix guys are typically very open for adding things if it's reasonable in the long run.

looooo commented 1 month ago

@Kabbone Yes I tried to use the pastix master now. But for sure there are some changes necessary. I am currently stuck with a crash of pastix and don't know how to proceed. Maybe someone knows which patch is necessary to continue from here:

Process 11659 launched: '/Users/lo/projects/freecad/CalculiX/.pixi/envs/default/bin/ccx' (arm64)

************************************************************

CalculiX Version 2.21, Copyright(C) 1998-2023 Guido Dhondt
CalculiX comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
certain conditions, see gpl.htm

************************************************************

You are using an executable made on Sa 10 Aug 2024 00:01:19 CEST

  The numbers below are estimated upper bounds

  number of:

   nodes:        18606
   elements:        15010
   one-dimensional elements:            0
   two-dimensional elements:            0
   integration points per element:            4
   degrees of freedom per node:            3
   layers per element:            1

   distributed facial loads:            0
   distributed volumetric loads:            0
   concentrated loads:            0
   single point constraints:            0
   multiple point constraints:            1
   terms in all multiple point constraints:            1
   tie constraints:            0
   dependent nodes tied by cyclic constraints:            0
   dependent nodes in pre-tension constraints:            0

   sets:            4
   terms in all sets:        70488

   materials:            1
   constants per material and temperature:            2
   temperature points per material:            1
   plastic data points per material:            0

   orientations:            0
   amplitudes:            0
   data points in all amplitudes:            0
   print requests:            0
   transformations:            0
   property cards:            0

 STEP            1

 Frequency analysis was selected

 Decascading the MPC's

 Determining the structure of the matrix:
 Using up to 1 cpu(s) for setting up the structure of the matrix.
 number of equations
 55818
 number of nonzero lower triangular matrix elements
 2077605

 Using up to 0 cpu(s) for setting up the structure of the matrix.
 Using up to 1 cpu(s) for the stress calculation.

 Using up to 1 cpu(s) for the symmetric stiffness/mass contributions.

Not reusing csc.
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.4.0
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                               Disabled
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 1
  Number of GPUs:                                0
  MPI communication support:              PastixMpiNone
  Distribution level:                     2D( 256)
  Blocking size (min/max):             1024 / 2048
  Computational models
    CPU:              AMD Opteron 6180 - Intel MKL
    GPU:             Nvidia K40 GK1108L - CUDA 8.0
  Low rank parameters:
    Strategy                        No compression

  Matrix type:  General
  Arithmetic:   Double
  Format:       CSC
  N:            55818
  nnz:          4211028

+-------------------------------------------------+
  Ordering subtask :
Process 11659 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x15142f4c148)
    frame #0: 0x0000000100ace774 libspm.1.dylib`spmIntSort1Asc1 + 108
libspm.1.dylib`spmIntSort1Asc1:
->  0x100ace774 <+108>: ldr    x14, [x13]
    0x100ace778 <+112>: ldr    x15, [x12]
    0x100ace77c <+116>: cmp    x14, x15
    0x100ace780 <+120>: b.ge   0x100ace790               ; <+136>
Target 0: (ccx) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x15142f4c148)
  * frame #0: 0x0000000100ace774 libspm.1.dylib`spmIntSort1Asc1 + 108
    frame #1: 0x0000000100aca6e4 libspm.1.dylib`p_spmSort + 452
    frame #2: 0x0000000100acbaa8 libspm.1.dylib`spmSort + 96
    frame #3: 0x0000000100b16fe4 libpastix.6.4.dylib`graphPrepare + 144
    frame #4: 0x0000000100b1a448 libpastix.6.4.dylib`pastix_subtask_order + 328
    frame #5: 0x0000000100b240d0 libpastix.6.4.dylib`pastix_task_analyze + 104
    frame #6: 0x00000001003baefc ccx`pastix_factor_main_generic + 340
    frame #7: 0x00000001003018d8 ccx`arpack + 13252
    frame #8: 0x000000010000cf24 ccx`main + 40164
    frame #9: 0x00000001827e20e0 dyld`start + 2360

You can find my attempt to build pastix and calculix here: https://github.com/looooo/CalculiX https://github.com/looooo/pastix

pastix is added as a git submodule. And spm is a submodule of pastix pointing to: https://gitlab.inria.fr/solverstack/spm

looooo commented 2 weeks ago

I am still stuck at this position. @Kabbone can you help with this issue? There are now pastix packages available via the conda-package-manager. It would be nice to make these pastix packages work with calculix.

Kabbone commented 2 weeks ago

I would start with setting forceRedo=1 in pastix.c without condition, because the reuse mechanism like it's implemented now would not work with mainline pastix. You should also set PastixSchedStatic as Scheduler. That would be my first shot. Did you try to run the PastiX examples to be sure the solver by itself does work? I never tried it on ARM if there are any problems or not. I assume there are none, but never tried.

looooo commented 1 week ago

I just tried a python example step_by_step.py and it also stops after pastix.task_analyze( pastix_data, spmA )

the output is the following:

ischedInit: The thread number has been automatically set to 4
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.4.0
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                        Started
    PaRSEC:                               Disabled
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 4
  Number of GPUs:                                0
  MPI communication support:              PastixMpiNone
  Distribution level:                     2D( 160)
  Blocking size (min/max):              160 /  320
  Computational models
    CPU:              AMD Opteron 6180 - Intel MKL
    GPU:             Nvidia K40 GK1108L - CUDA 8.0
  Low rank parameters:
    Strategy                        No compression
  Matrix type:  Symmetric
  Arithmetic:   Double
  Format:       CSC
  N:            125
  nnz:          425
+-------------------------------------------------+
  Ordering subtask :
pastix_subtask_order: Ordering not available (iparm[IPARM_ORDERING]=-1)

tried this with linux-aarch64.

Kabbone commented 1 week ago

did you build and link pastix with scotch? As how I understand the documentation scotch should be picked by default, but you can also set explicitly with iparm[IPARM_ORDERING]=PastixOrderScotch

looooo commented 1 week ago

no I was not able to build with scotch because of this issue: https://github.com/conda-forge/scotch-feedstock/issues/88

but we are working on this: https://github.com/conda-forge/scotch-feedstock/pull/90

3rav commented 1 week ago

" Ordering subtask : Process 11659 stopped ..."

@looooo How to get such additional information, I test in msys2/mingw environment?

looooo commented 1 week ago

" Ordering subtask : Process 11659 stopped ..."

@looooo How to get such additional information, I test in msys2/mingw environment?

With a debugger. In my Case I used lldb on OSX but you can also use gdb.

3rav commented 1 week ago

msys2/mingw and piparm[IPARM_ORDERING] = PastixOrderScotch;

[New Thread 15044.0x33c]
+-------------------------------------------------+
+     PaStiX : Parallel Sparse matriX package     +
+-------------------------------------------------+
  Version:                                   6.4.0
  Schedulers:
    sequential:                            Enabled
    thread static:                         Started
    thread dynamic:                       Disabled
    PaRSEC:                               Disabled
    StarPU:                               Disabled
  Number of MPI processes:                       1
  Number of threads per process:                 1
  Number of GPUs:                                0
  MPI communication support:              PastixMpiNone
  Distribution level:                     2D( 128)
  Blocking size (min/max):             1024 / 2048
  Computational models
    CPU:              AMD Opteron 6180 - Intel MKL
    GPU:             Nvidia K40 GK1108L - CUDA 8.0
  Low rank parameters:
    Strategy                        No compression

  Matrix type:  General
  Arithmetic:   Float
  Format:       CSC
  N:            3509
  nnz:          231601

+-------------------------------------------------+
  Ordering subtask :
    Ordering method is: Scotch

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ff6cb84df6a in _SCOTCHhgraphOrderCp ()
(gdb)