Closed davydden closed 6 years ago
I could offer some Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
and Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
nodes. Unfortunately, we don't have anything more than AVX2
.
With respect to https://github.com/CEED/Forum/issues/1 single-node performance is more important than MPI-scalability.
We would probably also see more gain from MatrixFree
in the 3D case. So we should definitely do that as well.
We probably want to plot time/dof/core as well as total time of MF vs MB.
That sounds reasonable. We should be able to produce graphs similar (http://www.sppexa.de/fileadmin/user_upload/EXADG.pdf, slide 7). I would definitely try to test with problems as large as we can so they still fit on a node.
I would definitely try to test with problems as large as we can so they still fit on a node.
Ok, then let me start working from that side on say p=8
degree.
I could offer some Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz and Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz nodes.
great! I will ping you as soon as I have some input files ready....
Unfortunately, we don't have anything more than AVX2.
same here.
We would probably also see more gain from MatrixFree in the 3D case. So we should definitely do that as well.
absolutely. For now I just wanted to start playing with 2D. Then we would need a 3D counterpart of https://github.com/davydden/large-strain-matrix-free/pull/32
For now I just wanted to start playing with 2D. Then we would need a 3D counterpart of #32
I guess you want spherical and not cylindrical holes?
I guess you want spherical and not cylindrical holes?
we can probably just do extrusion indeed and not bother with spherical holes/inclusion. @jppelteret what do you say?
If its only for the basis of benchmarking then I think that 3d cylindrical inclusions may be OK. What we could do to add an interesting material response without resorting to a more complex mesh is to take each cylindrical extrusion change the material ID over a part of the extrusion so that each cylindrical particle does not extend the whole way through the domain.
once I clean-up https://github.com/davydden/large-strain-matrix-free/pull/34 i will start doing some calculations.
The question to you @masterleinad and @jppelteret is how do we want to design meshes for this study. We probably want to plot
time
/dof
/core
as well astotal time
of MF vs MB. https://github.com/CEED/Forum/issues/1#issuecomment-408285040 the guys did memory calculations based on more or less constant number of DoFs3e6
. Do we want to do the same and, say, start withp=8
-th order elements, see what's the coarse mesh to get us to this number and then go down top=1
but adding global mesh refinements to keep the total DoFs around the same mark?I plan to run it on 1 node (2 x Xeon 2660v2 Ivy Bridge, 25 MB Shared Cache per chip and 64 GB of RAM) with 20 MPI processes without TBB so that MF/MB comparison is more fair. That's emmy cluster here in Erlangen.
I would assume we want the number of DoFs to be more or less constant and make sure sparse matrix never fits into L3 cache.
The student of ours did such studies with small-strain with
for 3D case of a head model he did up to
p=3
degree withlikwid-topology -g
gives the following ASCI picture for L1-L3 caches: