Open ofmla opened 4 years ago
I think the current evidence points at the fork before compilation being the culprit.
@mloubout @tjb900 ever noticed anything like this?
Yeah, definitely
It's worth noting that the physical memory usage doesn't increase as both processes refer to the same physical memory, but if there is anything in place that limits the total virtual memory footprint of a group of processes or all processes on the system, this kind of thing can trip it up (e.g. the value of /proc/sys/vm/overcommit_ratio
).
Possible workarounds are:
Finally, in answering this I have noticed that codepy uses pytools.prefork to actually spawn the compiler, and this module is explicitly designed to avoid some of the above issues. It does this by supporting forking a "fork server" early in the process before e.g. MPI is initialised or large memory allocations occur. And then the compiler processes are forked from that tiny process rather than the application process itself.
See https://github.com/inducer/pytools/blob/main/pytools/prefork.py
It looks to me like the intention of the above module is that calling pytools.prefork.enable_prefork()
very early in your application might sidestep some of the above issues quite neatly.
That's really a nice comprehensive answer. Thanks a lot @tjb900 .
We have been seeing the same memory errors in Stride when compiling certain operators. After doing some tests, it seems calling pytools.prefork.enable_prefork()
early solves the compilation problem.
However, the problem persists if the compiler or the MPI configuration is changed when memory use is high. That is because in these cases Devito uses subprocess.check_output
to sniff the available compilers, which calls subprocess.Popen
directly instead of using pytools.
Yeah, I've just been running into this lately as well. Seems like it might be a good idea to switch the compiler/mpi/gpu sniffs to use pytools.
Note that along the same lines, there is also an issue with the allocators initializing - ctypes.util.find_library
uses subprocess to do some pretty hacky stuff. That will be harder to fix, but I guess applications that run into this problem can manually initialise the allocators early.
these are memoized now, is it still an issue?
A python script running a TTI RTM for only one shot with pyrevolve (https://sesibahia-my.sharepoint.com/:u:/g/personal/oscar_ladino_fieb_org_br/EWpX_VT4U3lCqdAArLVaGKABe9oysSb0KKRDlqIyL1XpwA?e=TgKdL4) runs fine for a period of time before crashing with the following error:
It seems to be that forking to call the compiler when applying the pyrevolve.Operator temporarily doubles the memory of the parent process. If the parent process is using more than half of the system memory, it will lead to run into a memory error. A workaround is to precompile the code before instantiate the CheckpointOperator object, i.e
However, it would appear that there is another possible solution as suggested in this thread https://devitocodes.slack.com/archives/C7JMLMSG0/p1593739206410500?thread_ts=1593727821.408500&cid=C7JMLMSG0