devitocodes / devito

DSL and compiler framework for automated finite-differences and stencil computation
http://www.devitoproject.org
MIT License
562 stars 228 forks source link

AlltoAll for large problem #1095

Open mloubout opened 4 years ago

mloubout commented 4 years ago

The AlltoAll calls for MPI make Devito crash for large problems.

FabioLuporini commented 4 years ago

where (what python line) and reproducer

FabioLuporini commented 4 years ago

error trace too would be nice

mloubout commented 4 years ago
  File "/usr/local/lib/python3.6/dist-packages/devito/operator/operator.py", line 520, in arguments
    args = self._prepare_arguments(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/devito/operator/operator.py", line 419, in _prepare_arguments
    args.update(p._arg_values(**kwargs))
  File "/usr/local/lib/python3.6/dist-packages/devito/types/sparse.py", line 287, in _arg_values
    values = new._arg_defaults(alias=self).reduce_all()
  File "/usr/local/lib/python3.6/dist-packages/devito/tools/memoization.py", line 91, in __call__
    res = cache[key] = self.func(*args, **kw)
  File "/usr/local/lib/python3.6/dist-packages/devito/types/sparse.py", line 267, in _arg_defaults
    for k, v in self._dist_scatter().items():
  File "/usr/local/lib/python3.6/dist-packages/devito/types/sparse.py", line 821, in _dist_scatter
    [scattered, rcount, rdisp, mpitype])
  File "mpi4py/MPI/Comm.pyx", line 676, in mpi4py.MPI.Comm.Alltoallv
  File "mpi4py/MPI/msgbuffer.pxi", line 592, in mpi4py.MPI._p_msg_cco.for_alltoall
  File "mpi4py/MPI/msgbuffer.pxi", line 456, in mpi4py.MPI._p_msg_cco.for_cco_recv
  File "mpi4py/MPI/msgbuffer.pxi", line 300, in mpi4py.MPI.message_vector
  File "mpi4py/MPI/asarray.pxi", line 22, in mpi4py.MPI.chkarray
  File "mpi4py/MPI/asarray.pxi", line 15, in mpi4py.MPI.getarray
OverflowError: value too large to convert to int
FabioLuporini commented 4 years ago

command line to reproduce ? can you write an MFE? this seems to be due to the data distribution of SparseFunctions.

mloubout commented 4 years ago

command line to reproduce ? can you write an MFE?

Not really, just add a massive number of receivers in any example and at some point will crash like that. All examples are setup for tiny number of receivers so wouldn't pop up

FabioLuporini commented 4 years ago

Not really, just add a massive number of receivers in any example

so we should be able to write a 5-6 lines MFE. I'll try to reproduce

FabioLuporini commented 4 years ago

can we close this? @mloubout

mloubout commented 4 years ago

No, the PRs improved the set-up time for larger receivers (still issues with full size 3D I trying to track) but this error is not related, this is due to message size so will happen, trying to find a fix for that too

ggorman commented 4 years ago

@mloubout - you should not be expecting to see integer OverflowError unless you are running in the order of a couple of billion dof's. How large is your problem? If it really is that big then we have to ensure our indexing supports int64.

mloubout commented 4 years ago

s you are running in the order of a couple of billion dof's

You don't need to go that big to be way over that. 3D receivers, OBN setup with reciprocity:

And you have couple tens of billions