davydden / large-strain-matrix-free

A repository with code for the paper "A matrix-free approach for finite-strain hyperelastic problems using geometric multigrid"
GNU Lesser General Public License v2.1
2 stars 6 forks source link

Reviewers comments #52

Closed davydden closed 5 years ago

davydden commented 5 years ago

Reviewer1:


Reviewer2:


Other ideas:

masterleinad commented 5 years ago
* suggests to start with profiling and bottlenecks (for Trilinos?!). And only when bandwidth is identified as a bottleneck, we consider algorithms that trade bandwidth against flops (MF).

I guess we are fine here as well with the LIKWID results.

masterleinad commented 5 years ago
* argue that we do NOT intend to study scalability (i.e. focus on node level performance)? Or add a few extra MF examples (larger/smaller) to study caching effects?

I guess we addressed that as well.

davydden commented 5 years ago

agreed. I have LIKWID results for separate stages of the operator, will plot them today and update the manuscript if they are decent...

masterleinad commented 5 years ago

Martin thinks that the content of the article is sufficient and interesting. On the other hand, he is missing a little bit the golden thread. We should explain our expectations a little bit earlier. It might be a good idea to have Figures 6 and 7 a little bit earlier. He suggests explaining some more the expectations for the three caching strategies, i.e. count the number of elements actually stored and loaded and the number of operations per quadrature point in the inner loop (maybe also include some of the back-of-the-envelop calculations in https://github.com/CEED/Forum/issues/1) We should also try to explain the roofline model a bit more. We might, for example, say we have indirect memory access and might also be core-bound. He agrees that running without vectorization might be interesting as well to address one of the reviewer's questions. Martin was quite surprised about the question regarding memory latency. He says this does not play any role since the data can be loaded perfectly. We are really memory-bandwith-limited. Furthermore, he says that the expectations for lower polynomial degrees being better for the processor really comes from a matrix-based view. In the end, we should have sufficient data and references to clarify that high polynomial degrees are expected to give more flops.

masterleinad commented 5 years ago

Maybe, also provide raw numbers for Figures 6 and 7. In the end, the scaling of the y-axes makes distinguishing the matix-free results very difficult.

masterleinad commented 5 years ago

We also have a strong support here that node-level performance is crucial. Scaling then with respect to MPI is less an issue (and less interesting).

masterleinad commented 5 years ago

I guess we can address all this within the next week(s).

jppelteret commented 5 years ago

I guess we can address all this within the next week(s).

I know that I have not been contributing much, but after my vacation my time been almost completely consumed other work mandated by our boss. I've been promising @davydden that I'd get onto this soon, so you all can expect me to start reading through these changes this week.

Thanks very much to both of you for the continued and very dedicated work :-)

jppelteret commented 5 years ago

Course of action & distribution of work (goal deadline 1 Feb 2019)

Rebuttal process

davydden commented 5 years ago

resubmitted version 3602de5