Closed jiangzhongshi closed 3 years ago
@connorjward has been working on some of this performance stuff, and may be able to comment. One thing he has added (nearly available) is a flamegraph view of petsc logging data. This will enable seeing how the time breaks down inside the outermost solve.
In terms of performance in general. You can run your firedrake program with a sampling profiler like pyspy to see how it behaves. There is some overhead from being in Python, but it is mostly an affine cost, since we try and make sure that all the heavy work is devolved to compiled code. So if your problem is quite large, then most of the time will be in compute rather than cross-calling.
Thanks,
Dear wence-,
Thanks so much for the quick response and suggestions.
I have tried with py-spy
(without native
option since it fails to merge frames), from the profile result of a simple V-cycle example I have (I can provide the script if it helps), it seems fine_node_to_coarse_node_map
is taking a significant portion of the time (31%). Is this expected (in the sense that this will be significantly reduced in a C-based implementation), or is this some artifact since I was not able to enable the native
option in pyspy?
Zhongshi
it seems fine_node_to_coarse_node_map is taking a significant portion of the time
This is (should be) a one-time setup cost that is run once per instance of the multigrid solver setup. It does some array manipulation (but most of it happens in C anyway). If you run your solver in a loop, do you see this proportion of time falling?
Apologies for the delay in responding to this. @wence has already done a really good job of explaining most of what I would say.
At present I have found py-spy
to be the best tool for this sort of thing. I do recommend using the --native
flag if you can though because it gives a lot more insight into what is happening inside of PETSc. Without it enabled you will see some large blank areas in the flame graph.
Measuring the Python overhead, including in the callbacks, can be difficult because it is very problem dependent. For problems with a large number of DoF these overheads should be minimal. If you find that you are spending large amounts of time executing Python instead of C then there are likely ways to speed it up (e.g. by making sure that you're not creating a brand new solver object every timestep).
Dear Lawrence and Connar,
Thanks again for the suggestion!
If you run your solver in a loop, do you see this proportion of time falling?
It seems that the wallclock subsequent runs are indeed much faster than the first run. The cost seems to be associated with the instance solve, and what are the possible ways to amortize the cost? For example, whether the cache can be re-used when changing to a different set of boundary conditions, or as the same precondition for different physics?
I do recommend using the --native flag
Yes, the profiling seems to be helpful, but I always encounter Failed to merge native and python frames
. It seems that this is a known issue on the py-spy
side, I wonder in the context of firedrake, do you have some way to circumvent this, for example, pass some flag to petsc compilation?
Best, Zhongshi
Yes, the profiling seems to be helpful, but I always encounter
Failed to merge native and python frames
Sorry I haven't encountered that before and I don't know a fix. Hopefully we should have an alternative tool soon pending some merges into PETSc.
Noted, thanks so much for the help!
Dear Firedrakers,
I am working on a geometric coarsening hierarchy algorithm and found your library to be a great place to test the performance of my approach. I love the pythonic interface a lot and would like to use firedrake for iterating my development. But I am a little worried about how much overhead it incurs, and whether it would bias my estimation of the real performance. I have a few questions and it would be great if I can get some insight, within the context of geometric multigrid.
par_loop_*
)?I understand some of my questions are a bit too broad, but any suggestions will be very appreciated!
Zhongshi