Closed sbailey closed 6 years ago
Perhaps look here
https://github.com/numba/numba/blob/master/numba/caching.py#L114
There are a variety of schemes for caching, I don't see one that fits, but perhaps one could be contributed. A cache locator that lets you set the directory name based on MPI rank, and you'd just set a prefix like /tmp/numba-
Otherwise you have to manage the race condition, and the best answer for that is probably you should AOT anyway.
Yes AOT is probably the way to go. I looked into it at some point and found it was about the same speed as the JIT version. I will work on this.
On Thu, Jul 19, 2018 at 9:32 PM R. C. Thomas notifications@github.com wrote:
Perhaps look here
https://github.com/numba/numba/blob/master/numba/caching.py#L114
There are a variety of schemes for caching, I don't see one that fits, but perhaps one could be contributed. A cache locator that lets you set the directory name based on MPI rank, and you'd just set a prefix like /tmp/numba- Otherwise you have to manage the race condition, and the best answer for that is probably you should AOT anyway.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/desihub/specter/issues/65#issuecomment-406486250, or mute the thread https://github.com/notifications/unsubscribe-auth/AF8Y9GtiiB9zkis1JYgIGL3_DNUoc6Jaks5uIV1JgaJpZM4VXbQ8 .
When running
pixsim_nights_mpi
from desisim, I'm getting a traceback that ends withI'm running
The MPI communicator gets split into 3 communicators of 10 nodes each, and each of those communicators processes 1 exposure at a time. Those exposure communicators are further split into 10 frame communicators (1 per node) to process one frame at a time.
When I look in that
/global/homes/s/sjbailey/.cache/numba/global/common/software/desi/cori/desiconda/20180709-1.2.6-spec/code/specter/0.8.6/lib/python3.6/site-packages/specter-0.8.6-py3.6.egg/specter/util/
directory, I see three legval_numba*.nbc files written within a minute of each other, perhaps one per exposure communicator. It appears that there may be some race condition with creating the .nbc files.@lastephey or @rcthomas have you seen an MPI+numba caching problem like this before? I see that we use
I'm wondering if
cache=True
is problematic with MPI.Full traceback: