My wishlist - Githubissues

louisponet commented 6 years ago

After learning how to use/using JuAFEM for about 2 weeks, I must say that it is a great tool, very understandable and hackable design, and it seems general enough to be expanded upon greatly!

I've used JuAFEM basically as a discretization tool for calculating a continuum model's total energy and gradients/hessians, as discussed on juafem-forwarddiff-nlsolve-finite-elements-for-nonlinear-problems It performs very well, vastly outperforming other implementations that colleagues of mine have done in c++ or Mathematica (duh).

I'll now give my wishlist for features as requested in the thread:

Easier ways to access the fields inside a DofHandler, and create starting conditions e.g.:

function startingconditions!(dofvec, dh)
mid = div(ndofs_per_cell(dh), 2)
for cell in CellIterator(dh)
    globaldofs = celldofs(cell)
    for i=1:3:mid
        it = div(i, 3) + 1
        dofvec[globaldofs[mid+i]] = -0.53
        dofvec[globaldofs[mid+i+1]] = 0.53tanh(cell.coords[it][1]/5)
        dofvec[globaldofs[mid+i+2]] = 0.53
    end
end
end

In light of the other design, something like fielddofs(:Field, cell) similar to celldofs(cell), seems a good way. Also fielddofs(:Field, dh::DofHandler) would be very nice to calculate certain things from the finalized result. Edit: see #187 for implementation

A way to add fancier constraints, such as bounds on fields etc
More options to generate grids/read them from formats, although I think this is already somewhat possible with various other packages. What would be cool is if it was possible to generate non uniform grids, defining some kind of rule for element size/mesh density, depending on the coordinates.
Higher flexibility on combining field dimensions with geometry dimensions, e.g. 3D field on a 2D geometry.
A way to thread over the CellIterator loop. This seems a bit tricky since assembling residus or stiffness tensors is not trivial multithreaded. It really depends on the grid what needs to be done. The way I do it now, which is very much not scalable is pretty brutal (I have two fields, :u, :P, and thus a dofvector uP):
```
mutable struct ModelCaches{T, CV, CI, CF}
∇fs::Vector{Vector{T}}
cvus::Vector{CV}
cvPs::Vector{CV}
cellits::CI
configs::CF
end
```

function ModelCaches(∇f::Vector{T}, cvu, cvP, dh) where T cellits = [CellIterator(dh) for t=1:nth] ellen = length(celldofs(cellits[1])) ∇fs = [zeros(∇f) for t=1:nth] cvus = [copy(cvu) for t=1:nth] cvPs = [copy(cvP) for t=1:nth] configs = [GradientConfig(nothing, zeros(T, ellen), Chunk(ellen)) for t=1:nth] return ModelCaches(∇fs, cvus, cvPs, cellits, configs) end

function ∇F!(∇f::Vector{T} , uP::Vector{T}, caches, params) where T @threads for i=1:length(caches.∇fs) fill!(caches.∇fs[i], zero(T)) end results = [zeros(T, length(celldofs(caches.cellits[1]))) for t=1:nth] @threads for i=1:length(caches.cellits[1]) tid = threadid() reinit!(caches.cellits[tid], i) reinit!(caches.cvus[tid], caches.cellits[tid]) reinit!(caches.cvPs[tid], caches.cellits[tid]) ForwardDiff.gradient!(results[tid], uel -> element_potential(uel, caches.cvus[tid], caches.cvPs[tid], params), uP[celldofs(caches.cellits[tid])], caches.configs[tid]) assemble!(caches.∇fs[tid], celldofs(caches.cellits[tid]), results[tid]) end ∇f .= caches.∇fs[1] for fs in caches.∇fs[2:end] ∇f .+= fs end end


This works, but is very much not scalable, and quite a lot of redundant things happen.

For now this is what I ran into the most. If I have more I'll update this issue. I could also maybe try to implement a couple of these things myself, no promises on that :p.
Anyway keep up the good work, and again thanks for helping me in the discourse thread!

fredrikekre commented 6 years ago

Great thanks!

https://github.com/KristofferC/JuAFEM.jl/pull/187 probably covers the first point?

such as bounds on fields

What do you mean here exactly?

More options to generate grids/read them from formats, although I think this is already somewhat possible with various other packages. What would be cool is if it was possible to generate non uniform grids, defining some kind of rule for element size/mesh density, depending on the coordinates.

Yes, I think for this we should rely on other tools to generate the meshes and then have good converters that reads it into the JuAFEM mesh format.

fredrikekre commented 6 years ago

vastly outperforming other implementations that colleagues of mine have done in c++ or Mathematica (duh)

Do you have any benchmarks implementing the same problem? That would be interesting if you would be willing to share :)

louisponet commented 6 years ago

Not exactly sadly, so it’s a bit of an empty statement. But they implemented the problem in Mathematica using actual FEM, which involved them getting the first derivative equations, not an easy task, very messy and error prone. In the end that implementation, for 60K elements took 5+ min per iteration, and needed about a day at least of iterations to sort of arrive at convergence. They tried doing a pure minimization, but it runs out of memory very similarly to what happened to me with Optim the first try. The issue is that Mathematica doesnt really offer any way to make it better. This implementation I tested with 125K elements, takes about 1s per gradient calculation and a bit less per energy calculation. So around 2s per iteration, and requires around 200 iterations to get convergence using ConjugateGradient from Optim.

With regards to C++ the implementation wasn’t complete, but again the big advantage was the clean and straightforward implementation that is (at least from early guestimates) on the same level of performance. I can remove the point and this comment if its not relevant.

-- Louis Ponet Sent with Airmail

On 15 April 2018 at 22:13:14, Fredrik Ekre (notifications@github.com) wrote:

vastly outperforming other implementations that colleagues of mine have done in c++ or Mathematica (duh)

Do you have any benchmarks implementing the same problem? That would be interesting if you would be willing to share :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

louisponet commented 6 years ago

187 probably covers the first point?

Yes it does! I was also wondering if there is a way to find out a node index while running through the cells. This would be useful for calculating results after the optimization/ solving step.

function strains(uP, cvu, dh)
    strainout = zeros(Tensor{2 ,3}, div(length(uP),6))
    nodes = Vector{Vec{3, Float64}}(length(strainout))
    str = zeros(Tensor{2,3},getnquadpoints(cvu))
    mid = div(ndofs_per_cell(dh),2)
    first = true
    for cell in CellIterator(dh)
        reinit!(cvu, cell)
        dofs = celldofs(cell)
        str  .= strainout[div.(dofs[1:6:24].-1, 6).+1]
        for q=1:getnquadpoints(cvu)
            str[q] = function_symmetric_gradient(cvu, q, uP[dofs[1:mid]])
        end
        strainout[div.(dofs[1:6:24].-1, 6).+1] .= str
        nodes[div.(dofs[1:6:24].-1, 6).+1] .= getcoordinates(cell)
    end
    return strainout, nodes
end

Mainle the indexing part for strainout and nodes is a bit tedious. But maybe it's too obscure of a usecase.

such as bounds on fields What do you mean here exactly?

Like setting maximum/minimum allowed values, it's like extra constraints on the optimization. Also Neumann BCS would be cool. Is that done using the facesets and onboundary? EDIT: I found #178 which seems to go over this.

louisponet commented 6 years ago

With regards to the multithreading, are there any ideas that you tried before? And what would be the best way to do it? I've been trying different things, which seem to work fine up to like 8 threads, but I would love to be able to throw it on 64+ threads on the cluster we have. The main issue seems to be memory accesses, and things memory related.

KristofferC commented 6 years ago

You mean that there is no speedup after 8 threads? How are you partitioning the mesh?

louisponet commented 6 years ago

Sorry, I mean that there was a speedup, but the scaling is going down quickly, around 12-16 threads, things start to slow down instead of speeding up.

I'm basically doing as in the code that I posted up above. Just @threads over the amount of cells. I've tried by creating seperate DofVector for each thread, updating them each optimization step, by going through the cells, and putting them in order. This didn't work out too great.

KristofferC commented 6 years ago

Here is an example of threaded assembling: https://github.com/KristofferC/JuAFEM.jl/pull/196.

It includes a coloring function that colors cells so that no cells with the same color share a node. It is, therefore, safe to assemble each color in parallel.

screen shot 2018-04-18 at 11 31 24

Haven't run on a high core machine so not sure of the speedup. Speedup with 4 threads is 3.28x at least.

mohamed82008 commented 6 years ago

This is cool! Does the speedup take the coloring time into account? It would be nice to also parallelize the coloring.

KristofferC commented 6 years ago

The coloring time should be quite small compared to e.g. creating the sparsity pattern. It is only done once so its speed is not as important as the assembling.

louisponet commented 6 years ago

This is very nice!, I've ran the example with n=60, measuring only the loop over the colors (not the construction, I think somehow the scratches could be constructed only once, but I didnt have too much time to fully test everything). First measurement was 10 iterations with 24 cores -> 8.30 seconds per iteration on average. Second with 64 cores -> 4.12s per iteration on average. This is a scaling of around 75% of the perfect 100%. Very nice!! I will implement this into the problem I'm dealing with (with a lot more complex inner loop body, involving the gradients and a lot of tensor operations) and report back what scaling I achieve!

louisponet commented 6 years ago

So I've done some tests after implementing it into my problem. I get pretty much the same performance as my more brutal approach. However, I think it's nothing to do with JuAFEM but rather the way the gradient is performed. When I take a big grid, with 64 threads, observing the cpu usage, after a while it drops down from the full 6400% to somewhere around 1000% (10 cores). I'm not sure why this is happening. I also see the total memory allocations vary a lot each time I do an iteration over the entire grid. But I think this is a discussion that doesn't belong in this thread. Your multithreading implementation seems to work perfectly and scales quite well (when not bottlenecked by the ForwardDiff.gradient!)!

jondeuce commented 5 years ago

* Higher flexibility on combining field dimensions with geometry dimensions, e.g. 3D field on a 2D geometry.

Hi all,

I've been using JuAFEM in my work for quite some time now, and it's really been extremely useful for me, so thanks a lot to the developers! I came across this issue because I'm now running into the above "wish" in the OP's original post. I would like to combine a 3D field with a 2D geometry, but as of right now the field dimension and geometry dimension seem quite intertwined.

Is there a possible workaround to make this doable, at least for the simplest case (2D triangular grid + single 3D field)? And if not, could I be pointed in the right direction to try to contribute to adding this functionality?

fredrikekre commented 5 years ago

Hi, thats great to hear!

We have actually been discussing that the last two days, and will probably make some more progress soonish.

3D field with a 2D geometry

For my own interest, what are you doing?

Is there a possible workaround to make this doable

You can add a 2D field and a 1D field

jondeuce commented 5 years ago

Thanks for the quick reply! And no kidding, that's a happy coincidence, I'm glad it's in the works.

My work generally revolves around solving the Bloch-Torrey equation for MRI simulations, where one is interested in simulating the time evolution of magnetization vectors M(x,t) = [Mx(x,t), My(x,t), Mz(x,t)] which are diffusing within local magnetic field perturbations within e.g. a simulated cell or imaging volume. Often, however, one can ignore the longitudinal component Mz and just simulate the transverse component, and this is what I have done up until now (on a 2D grid).

Is there a possible workaround to make this doable

You can add a 2D field and a 1D field

This will actually work perfectly, now that I think about it; the PDE for Mz can be decoupled from the PDEs for the transverse components Mx and My in my use case, so this will be an easy fix for now.

Thanks for the help! Looking forward to this feature being added in full "soonish" :)

Ferrite-FEM / Ferrite.jl

My wishlist #193

187 probably covers the first point?