Open Patol75 opened 4 years ago
Hello,
Thank you for reaching out. Did you install Charm4Py from pip
or did you build Charm++ to use with Charm4Py manually? Also, are you distributing the entire 3GB data to each pool worker PE? Also, are you using charm.pool.map
or some other functionality of the pool?
Thanks for having a look @ZwFink.
The error message provided results from a pip
installation of Charm4Py. I have just tried with an MPI build of Charm++ (./build charm4py mpi-linux-x86_64 -j8 --with-production
) and it yields a very similar error message:
Running on 17 processors: /usr/bin/python3.8 script.py HUGE.vtu --charm
charmrun> /usr/bin/setarch x86_64 -R mpirun -np 17 /usr/bin/python3.8 script.py HUGE.vtu --charm
Charm++> Running on MPI version: 3.1
Charm++> level of thread support used: -1 (desired: 0)
Charm++> Running in non-SMP mode: 17 processes (PEs)
Converse/Charm++ Commit ID: v6.11.0-beta1-29-gd35885331
Isomalloc> Synchronized global address space.
CharmLB> Load balancer assumes all CPUs are same.
Charm4py> Running Charm4py version 1.0 on Python 3.8.0 (CPython). Using 'cython' interface to access Charm++
Charm++> Running on 1 hosts (1 sockets x 10 cores x 2 PUs = 20-way SMP)
Charm++> cpu topology info is gathered in 0.009 seconds.
Initializing charm.pool with 16 worker PEs. Warning: charm.pool is experimental (API and performance is subject to change)
----------------- Python Stack Traceback PE 1 -----------------
File "charm4py/charmlib/charmlib_cython.pyx", line 863, in charm4py.charmlib.charmlib_cython.recvGroupMsg
File "/home/thomas/.local/lib/python3.8/site-packages/charm4py/charm.py", line 253, in recvGroupMsg
header, args = self.unpackMsg(msg, dcopy_start, obj)
File "charm4py/charmlib/charmlib_cython.pyx", line 739, in charm4py.charmlib.charmlib_cython.CharmLib.unpackMsg
------------- Processor 1 Exiting: Called CmiAbort ------------
Reason: UnpicklingError: pickle data was truncated
[1] Stack Traceback:
[1:0] libcharm.so 0x7fff576fb6ec CmiAbortHelper(char const*, char const*, char const*, int, int)
[1:1] libcharm.so 0x7fff576fb801
[1:2] charmlib_cython.cpython-38-x86_64-linux-gnu.so 0x7fff57a25dfe
[1:3] python3.8 0x5fff6f _PyObject_MakeTpCall
[1:4] python3.8 0x4ffbbf
[1:5] python3.8 0x57dbb0 _PyEval_EvalFrameDefault
[1:6] python3.8 0x602b2c _PyFunction_Vectorcall
[1:7] python3.8 0x57904d _PyEval_EvalFrameDefault
[1:8] charmlib_cython.cpython-38-x86_64-linux-gnu.so 0x7fff57a1ecc0
[1:9] charmlib_cython.cpython-38-x86_64-linux-gnu.so 0x7fff57a20464
[1:10] charmlib_cython.cpython-38-x86_64-linux-gnu.so 0x7fff57a23847
[1:11] charmlib_cython.cpython-38-x86_64-linux-gnu.so 0x7fff57a4d5f1
[1:12] libcharm.so 0x7fff576ba77b GroupExt::__entryMethod(void*, void*)
[1:13] libcharm.so 0x7fff576256d4 CkDeliverMessageFree
[1:14] libcharm.so 0x7fff5762e6d6 _processHandler(void*, CkCoreState*)
[1:15] libcharm.so 0x7fff576bf491 CsdScheduleForever
[1:16] libcharm.so 0x7fff576bf6fd CsdScheduler
[1:17] libcharm.so 0x7fff576fdfaa ConverseInit
[1:18] libcharm.so 0x7fff5762b91c StartCharmExt
[1:19] charmlib_cython.cpython-38-x86_64-linux-gnu.so 0x7fff57a4a7ae
[1:20] python3.8 0x5fff6f _PyObject_MakeTpCall
[1:21] python3.8 0x4ffbbf
[1:22] python3.8 0x57dbb0 _PyEval_EvalFrameDefault
[1:23] python3.8 0x5765ec _PyEval_EvalCodeWithName
[1:24] python3.8 0x602dd2 _PyFunction_Vectorcall
[1:25] python3.8 0x57904d _PyEval_EvalFrameDefault
[1:26] python3.8 0x5765ec _PyEval_EvalCodeWithName
[1:27] python3.8 0x662c2e
[1:28] python3.8 0x662d07 PyRun_FileExFlags
[1:29] python3.8 0x663a1f PyRun_SimpleFileExFlags
[1:30] python3.8 0x687dbe Py_RunMain
[1:31] python3.8 0x688149 Py_BytesMain
[1:32] libc.so.6 0x7ffff7a03bf7 __libc_start_main
[1:33] python3.8 0x607daa _start
Regarding data distribution, I am not fully sure, and it is highly possible I am not doing something ideal. Data from the VTU is read in the main
function (the one given to charm.start()
). It is then used to create 3-D Scipy Interpolator objects using, for example, NearestNDInterpolator. These objects are then passed to each function execution through charm.pool
.
I am using the multi_future
argument of map_async
, making sure 60,000 futures at most are created, and I provide a function foo
with a partial
construct to pass a dictionary that holds variables which will be accessed by each process. These variables are never modified, they are only read or, in the case of the Interpolator objects mentioned above, called. I paste below the relevant snippet.
if inputArgs.charm: # Charm4Py
nBatch = np.ceil(np.sum(varDict['indArr'] == 0) / 6e4).astype(int)
for batch in range(nBatch):
nodes2do = np.asarray(varDict['indArr'] == 0).nonzero()
futures = charm.pool.map_async(
partial(foo, dictGlobals=dictGlobals),
list(zip(*[nodes[:60_000] for nodes in nodes2do])),
multi_future=True)
for future in charm.iwait(futures):
output = future.get()
for i, var in enumerate(outVar):
varDict[var][output[0]] = output[i + 1]
varDict['indArr'][output[0][::-1]] = 1
varDict['nodesComplete'] += 1
Hi, I have a program written in Python which post-processes an output VTU file from a fluid dynamics framework. The program is massively parallel in the sense that it repeats the exact same calculations at different locations in space, distributed on a grid. I have used Charm4Py to parallelise the workload on a large, university-like cluster, especially the pool functionality as it seemed to be the most appropriate to me. Everything is working properly on "regular" VTU files, and I am obtaining results which compare really well to
multiprocessing
on a single node andRay
on a multi-node environment. However, I am encountering an issue when I provide a "huge" VTU. What I mean by huge is 3 GB, whereas other files are on the order of a few 100 MB. I am pasting below the output generated by Charm4Py and the traceback for the first PE; the computer I have run this on has 64 GB of RAM. I would be really grateful if anyone could help me with this error and explain to me what it means so that I can attempt to fix it. I am more than happy to provide additional information about the program in itself.Thank you for any help.