Closed ahojukka5 closed 8 years ago
:+1:
I also tried to build Trilinos but I was in no luck. My build command failed at this point
[ 38%] Building CXX object packages/seacas/libraries/ioss/src/transform/CMakeFiles/Iotr.dir/Iotr_VectorMagnitude.C.o
[ 38%] Building CXX object packages/seacas/libraries/ioss/src/transform/CMakeFiles/Iotr.dir/Iotr_Initializer.C.o
[ 38%] Building CXX object packages/seacas/libraries/ioss/src/transform/CMakeFiles/Iotr.dir/Iotr_MinMax.C.o
Linking CXX static library libIotr.a
[ 38%] Built target Iotr
Scanning dependencies of target Ionit
[ 38%] Building CXX object packages/seacas/libraries/ioss/src/init/CMakeFiles/Ionit.dir/Ionit_Initializer.C.o
Linking CXX static library libIonit.a
[ 38%] Built target Ionit
Scanning dependencies of target cth_pressure_map
[ 38%] Building CXX object packages/seacas/libraries/ioss/src/main/CMakeFiles/cth_pressure_map.dir/cth_pressure_map.C.o
[ 38%] Building CXX object packages/seacas/libraries/ioss/src/main/CMakeFiles/cth_pressure_map.dir/vector3d.C.o
Linking CXX executable cth_pressure_map
/usr/bin/ld: cannot find -lIoexo_fac
collect2: error: ld returned 1 exit status
make[2]: *** [packages/seacas/libraries/ioss/src/main/cth_pressure_map] Error 1
make[1]: *** [packages/seacas/libraries/ioss/src/main/CMakeFiles/cth_pressure_map.dir/all] Error2
make: *** [all] Error 2
Still no clue what is Ioexo_fac library, but gotta google it...
My attempt with options (cloned from github repo)
cmake -DTPL_ENABLE_MPI=ON -DTrilinos_ENABLE_ALL_PACKAGES=ON -DCMAKE_INSTALL_PREFIX=/opt/trilinos -DTrilinos_ASSERT_MISSING_PACKAGES=OFF -DTPL_ENABLE_Matio=OFF -DTPL_ENABLE_GLM=OFF ../trilinos
...
[ 16%] Building C object packages/seacas/libraries/exodus/cbind/CMakeFiles/exodus.dir/src/ex_create_par.c.o /home/jukka/dev/trilinos/packages/seacas/libraries/exodus/cbind/src/ex_create_par.c:130:24: fatal error: netcdf_par.h: No such file or directory
^
compilation terminated. make[2]: * [packages/seacas/libraries/exodus/cbind/CMakeFiles/exodus.dir/src/ex_create_par.c.o] Error 1 make[1]: * [packages/seacas/libraries/exodus/cbind/CMakeFiles/exodus.dir/all] Error 2 make: *\ [all] Error 2
I also cloned from git repo (https://github.com/trilinos/trilinos)
all the commands I did:
> git clone https://github.com/trilinos/trilinos.git
> mkdir trilinos_build && cd trilinos_build
> cmake -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_CXX_COMPILER=/usr/bin/g++
-DCMAKE_Fortran_COMPILER=/usr/bin/gfortran -DTrilinos_ENABLE_ALL_PACKAGES=ON
-DCMAKE_INSTALL_PATH=/home/olli/programming/trilinos_build
-DTPL_ENABLE_Netcdf=OFF -DTrilinos_ASSERT_MISSING_PACKAGES=OFF /home/olli
/programming/trilinos
> make
I was missing this Netcdf library, which i silenced with flag -DTPL_ENABLE_Netcdf=OFF so I guess that might be my problems source..
[ 24%] Building CXX object packages/seacas/libraries/ioss/src/main/CMakeFiles/cth_pressure_map.dir/vector3d.C.o Linking CXX executable cth_pressure_map /usr/bin/ld: cannot find -lIoexo_fac collect2: error: ld returned 1 exit status
Library worth looking: https://github.com/dpo/MUMPS.jl
https://github.com/JuliaSparse/Metis.jl
Metis is already wrapped to Julia.
I'm not a parallel computing expert so I really don't know how silly this idea is, but after reading the VDolean slides it seems that if we really don't care about convergence speeds and want to just distribute computing, classical block Jacobi algorithm does the job and is might be quite easy to implement. There's some implementation examples in Wikipedia, https://en.wikipedia.org/wiki/Jacobi_method. It has some recent developments which could accelerate the convergence while keeping the method simple (http://phys.org/news/2014-06-19th-century-math-tactic-makeoverand.html, http://engineering.jhu.edu/fsag/wp-content/uploads/sites/23/2013/10/JCP_revised_WebPost.pdf). After fast-reading the proposed method is suitable especially for elliptical PDE's. But still I have no idea is the convergence comparable to "modern" algorithms. Anyway this could be our naive strategy to distribute computing to several cores for first release unless we have some better ideas.
"Simple is better than complex."
Here is also some Metis functionality: https://github.com/JuliaGraphs/LightGraphs.jl
Maybe this is a helpful article of the topic: http://www.scs.leeds.ac.uk/pkj/Papers/Books/JT98.pdf
Here is code starting from page 20: http://www.mcs.anl.gov/petsc/documentation/tutorials/jhouchins_writeup_revised.pdf
This is a deal.II paper with 1000 cores results: http://www.math.tamu.edu/~bangerth/publications/2010-parsim2010.pdf
This seems a relevant paper as well: http://p4est.github.io/papers/BangerthBursteddeHeisterEtAl11.pdf
Morning
I made a simple example describing one possible method which could be used to parallelize solution, see https://github.com/JuliaFEM/JuliaFEM.jl/blob/master/notebooks/2015-06-30-parallel-solution-using-substructuring.ipynb
It's not realistic because it uses matrix inversions but I think it has the "idea" how it could be done. We first solve the boundary system by eliminating the interior nodes and after that solve a much smaller system. It's kind of a variant described in paper [19]. I haven't finished with this yet. It seems that in paper the author ends up to more elegant solution.
One very important requirement came in my mind a second ago. Do we want to develop parallel implementation that can later on be implemented for GPGPU computations easily? I bet that in next 10 years we are seeing a lot of GPGPU computation.
http://www.rdi.uwa.edu.au/__data/assets/pdf_file/0005/1524848/Finite-Element-Method.pdf http://web.stanford.edu/group/lavxm/publications/journal-papers/CeLeDa11.pdf
etc..
Instead of having iterative parallel solver we could also do a direct parallel solver. See [23] for results. Interestingly it seems that both ABAQUS and Code Aster which are both capable to solve very large systems relay on direct solvers as they main solver. It of course requires more memory but can also solve stiff problems typical in the field of computational mechanics. Recursive / hierarchical multifrontal methods seems to be very promising approach.
From http://www.mathworks.com/matlabcentral/newsreader/view_thread/170254:
"is to use a factored matrix object for K_bb^-1 (very critical if you want to have 100 000 DOFs). Then, you solve for Kbb^-1*Kba by piece by block. Then you project the matrices.
You have to realize that 1e5_5000_8/1023^3= 3.7 GB If you want that to happen on a 32 bit machine you need to go out of core (like SDT and ANSYS do)."
According to our road map we specifically want to solve 100 million DOF using 1000 cores, i.e. 100 000 DOF/core. The question is how much of memory / core we need then? Anyway I think that both network speed and amount of memory / core is increasing quickly in cloud based clusters.
I made a MUMPS download request here: http://mumps.enseeiht.fr/index.php?page=dwnld#form
Let's start to investigate more closely this possibility to use MUMPS to parallel our stuff. At least is seems to work in static cases for solid mechanics.
I'm looking to get the MUMPS build.jl up to date. Some success already.
What I have been thinking is something like this. We would make the area decomposition as described above. We will try to use pure julia parallelism for that. Then each of those areas are solved with parallel MUMPS direct solver in shared memory. What do you think?
What do you mean by shared memory? After decomposition the "boundary parts" are in different machines which are not able to communicate between each other but just to head node. But yeah, we could try to use Julia's native parallel implementation and trying to make our parallel solution so general that it could be used by other projects too.
Inside one area you can have shared memory. I mean to use all cores of one machine for the direct solver. In my above proposal the direct solver is MUMPS. Currently I don't know if it works like that, but the idea is to get best of both worlds: Message Passing style between boundaries and OpenMP style inside the area.
Something very interesting to follow: https://github.com/JuliaParallel/Elemental.jl
See the original C++ version with documentation: https://github.com/elemental/Elemental
Yes that sounds reasonable to use OpenMP + MPI. I think that's called hybrid parallelism. If we want to assemble global stiffness matrix in shared memory we need to "color" our degrees of freedom and it's a bit more complicated that pure MPI implementation. But not too much complicated compared to the benefits.
A little update to here. It seems that ABAQUS is using sparse gaussian elimination for direct solving. I added [26] which is describing the ideas behind algorithm. Another little update is that julias -operator has threading support, it support sparse matrices and it's actually quite fast. (To be more specific, it's the factorize
command which have threading support, http://julia.readthedocs.org/en/latest/stdlib/linalg/). Solving 140000 dof problem took about 30 seconds and 7.1 GB of memory.
Possible interesting topic at julia-users: https://groups.google.com/forum/#!topic/julia-users/a7uk5Sg_cSs
Viral B Shah - The many ways of parallel computing with Julia: https://www.youtube.com/watch?v=HCcO-715acM&feature=youtu.be
After studying this subject I think I have now a clear idea for our first attempt. We basically distribute subdomains to nodes using julia's parallel implementation, do LU-factorization locally on clients, fetch interface problems back to host and solve it there directly. This should be actually quite simple to do.
Any ideas how to do, using julia's parallel implementation:
The key point here is that data, this case, factorized matrices, must be preserved in client machines during operations because they are needed later on for back substitution.
Maybe relevant discussion: https://groups.google.com/forum/#!topic/julia-users/E8fGIiDwckc
This is now working through PETSc.
Like almost all others we could start with Trilinos. It has Python bindings so I guess it's usable in Julia too with some effort. Unfortunately nothing compiles for me but maybe some other has better luck. (If you succeed, please share).
Links
[1] http://trilinos.org/oldsite/packages/pytrilinos/UsersGuide.pdf (1d poisson problem in p. 14) [2] http://www.researchgate.net/publication/231168298_Parallel_FEM_Application_Development_in_Python [3] http://www.sciencedirect.com/science/article/pii/S0309170811000777 [4 ]http://www.scientificpython.net/pyblog/domain-decomposition-alternating-schwarz [5] http://calcul.math.cnrs.fr/Documents/Ecoles/ET2011DD/VDolean1.pdf [6] https://github.com/dpo/MUMPS.jl [7] https://github.com/JuliaSparse/Metis.jl [8] https://github.com/JuliaGraphs/LightGraphs.jl [9] http://www.scs.leeds.ac.uk/pkj/Papers/Books/JT98.pdf [10] http://www.mcs.anl.gov/petsc/documentation/tutorials/jhouchins_writeup_revised.pdf (p. 20) [11] http://www.math.tamu.edu/%7Ebangerth/publications/2010-parsim2010.pdf (1000 core results) [12] http://p4est.github.io/papers/BangerthBursteddeHeisterEtAl11.pdf [13] http://tu-dresden.de/die_tu_dresden/fakultaeten/fakultaet_mathematik_und_naturwissenschaften/fachrichtung_mathematik/institute/wir/dissertation/thomas_witkowski [14] https://courses.engr.illinois.edu/cs554/fa2013/notes/10_iterative_8up.pdf [15] https://github.com/JuliaParallel [16] http://libelemental.org [17] http://www.math.tamu.edu/~bangerth/videos/676/slides.41.pdf
[18] http://www.cats.rwth-aachen.de/teaching/para-2013 (Parallel computing for computational mechanics, lecture notes)
[19] http://mat.fsv.cvut.cz/komisevstez/13/prispevky/sousedik.pdf (Finite Element Solution Algorithm for Nonlinear Elasticity Problems by Domain Decomposition Method)
[20] https://en.wikipedia.org/wiki/BDDC [21] https://en.wikipedia.org/wiki/FETI-DP [22] https://en.wikipedia.org/wiki/Balancing_domain_decomposition_method
[23] http://www.supercomp.org/sc2003/paperpdfs/pap238.pdf (IPSAP : A High-performance Parallel Finite Element Code for Large-scale Structural Analysis Based on Domain-wise Multifrontal Technique)
[24] http://am2013.math.cas.cz/proceedings/contributions/vejchodsky.pdf (A DIRECT SOLVER FOR FINITE ELEMENT MATRICES REQUIRING O(N log N ) MEMORY PLACES)
[25] http://ocw.mit.edu/resources/res-2-002-finite-element-procedures-for-solids-and-structures-spring-2010/linear/lecture-9/MITRES2_002S10_lec09.pdf (Solution of finite element equilibrium equations in static analysis)
[26] http://infolab.stanford.edu/pub/cstr/reports/na/m/80/06/NA-M-80-06.pdf (A NEW IMPLEMENTATION OF SPARSE GAUSSIAN ELIMINATION)