devitocodes / devito

DSL and compiler framework for automated finite-differences and stencil computation
http://www.devitoproject.org
MIT License
562 stars 228 forks source link

ConditionalDimension leads to kernel segfault #1829

Closed mwcvitkovic closed 2 years ago

mwcvitkovic commented 2 years ago

The following minimal reproducible example kills the kernel in a jupyter notebook when nsnaps equals 1, 2, 4, 5, 50, 53, or 54. It runs when nsnaps equals 3, 51, or 52.

from devito import Grid, TimeFunction, Eq, Operator, solve, ConditionalDimension

num_timesteps=500
nsnaps=54

grid = Grid(shape=(100, 100))

# For snapshotting
factor = round(num_timesteps / nsnaps)
time_subsampled = ConditionalDimension('t_sub', parent=grid.time_dim, factor=factor)
psave = TimeFunction(name='psave', grid=grid, save=nsnaps, time_dim=time_subsampled)

# Base simulation
p = TimeFunction(name='p', grid=grid, space_order=2, time_order=2)

# Source
p.data[0, 0, 0] = 1.

weq = Eq(0.5**2 * p.laplace - p.dt2)
step = Eq(p.forward, solve(weq, p.forward))
op = Operator([step] + [Eq(psave, p)])
summary = op.apply(time=num_timesteps, dt=1e-2)

Is there an obvious reason for this?

System info

Python env

absl-py 1.0.0 pypi_0 pypi appnope 0.1.2 py38h50d1736_1 conda-forge argon2-cffi 20.1.0 py38h5406a74_2 conda-forge async_generator 1.10 py_0 conda-forge attrs 20.3.0 pyhd3deb0d_0 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.1 py_0 conda-forge bleach 3.2.3 pyh44b312d_0 conda-forge boto 2.49.0 pypi_0 pypi boto3 1.16.58 pypi_0 pypi botocore 1.19.58 pypi_0 pypi brotlipy 0.7.0 py38h9ed2024_1003
bzip2 1.0.8 h0d85af4_4 conda-forge ca-certificates 2021.10.8 h033912b_0 conda-forge certifi 2021.10.8 py38h50d1736_1 conda-forge cffi 1.14.3 py38h2125817_2
chardet 3.0.4 py38hecd8cb5_1003
cirpy 1.0.2 pypi_0 pypi conda 4.11.0 py38h50d1736_0 conda-forge conda-package-handling 1.7.2 py38h22f3db7_0
cryptography 3.2.1 py38hbcfaee0_1
cycler 0.10.0 py_2 conda-forge dbus 1.13.18 h18a8e69_0
decorator 4.4.2 py_0 conda-forge defusedxml 0.6.0 py_0 conda-forge entrypoints 0.3 pyhd8ed1ab_1003 conda-forge expat 2.2.10 hb1e8313_2
ffmpeg 4.2.2 h97e5cf8_0
flatbuffers 2.0 pypi_0 pypi freetype 2.10.4 h4cff582_1 conda-forge gettext 0.19.8.1 h1f1d5ed_1 conda-forge glib 2.55.0 0 conda-forge gmp 6.1.2 h0a44026_1000 conda-forge gnutls 3.6.13 hc269f14_0 conda-forge hashids 1.3.1 pypi_0 pypi icu 64.2 h6de7cb9_1 conda-forge idna 2.10 py_0
importlib-metadata 3.4.0 py38h50d1736_0 conda-forge importlib_metadata 3.4.0 hd8ed1ab_0 conda-forge ipykernel 5.4.3 py38h9bb44b7_0 conda-forge ipython 7.21.0 py38h9bb44b7_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.3 pyhd3deb0d_0 conda-forge jax 0.2.25 pypi_0 pypi jaxdf 0.1.0 pypi_0 pypi jaxlib 0.1.74 pypi_0 pypi jedi 0.18.0 py38h50d1736_2 conda-forge jinja2 2.11.2 pyh9f0ad1d_0 conda-forge jmespath 0.10.0 pypi_0 pypi jpeg 9d hbcb3906_0 conda-forge jsonschema 3.2.0 py_2 conda-forge jupyter 1.0.0 py38h50d1736_5 conda-forge jupyter_client 6.1.11 pyhd8ed1ab_1 conda-forge jupyter_console 6.2.0 py_0 conda-forge jupyter_core 4.7.0 py38h50d1736_1 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_widgets 1.0.0 pyhd8ed1ab_1 conda-forge kiwisolver 1.3.1 py38h23ab428_0
lame 3.100 h35c211d_1001 conda-forge lcms2 2.12 h577c468_0 conda-forge libblas 3.9.0 7_openblas conda-forge libcblas 3.9.0 7_openblas conda-forge libcxx 10.0.0 1
libedit 3.1.20191231 h1de35cc_1
libffi 3.3 hb1e8313_2
libgfortran 5.0.0 9_3_0_h6c81a4c_16 conda-forge libgfortran5 9.3.0 h6c81a4c_16 conda-forge libiconv 1.15 h0b31af3_1006 conda-forge liblapack 3.9.0 7_openblas conda-forge libopenblas 0.3.12 openmp_h54245bb_1 conda-forge libopus 1.3.1 hc929b4f_1 conda-forge libpng 1.6.37 h7cec526_2 conda-forge libsodium 1.0.18 hbcb3906_1 conda-forge libtiff 4.2.0 h9da4c3f_0
libvpx 1.7.0 h378b8a2_0
libwebp-base 1.2.0 hbcf498f_0 conda-forge llvm-openmp 11.0.1 h7c73e74_0 conda-forge lz4-c 1.9.2 h4a8c4bd_1 conda-forge markupsafe 1.1.1 py38h5406a74_3 conda-forge matplotlib 3.3.4 py38h50d1736_0 conda-forge matplotlib-base 3.3.4 py38h8b3ea08_0
memory-profiler 0.58.0 pypi_0 pypi mistune 0.8.4 py38h5406a74_1003 conda-forge nbclient 0.5.1 py_0 conda-forge nbconvert 6.0.7 py38h50d1736_3 conda-forge nbformat 5.1.2 pyhd8ed1ab_1 conda-forge ncurses 6.2 h0a44026_1
nest-asyncio 1.4.3 pyhd8ed1ab_0 conda-forge nettle 3.4.1 h3efe00b_1002 conda-forge notebook 6.2.0 py38h50d1736_0 conda-forge numpy 1.19.5 py38h6ced74f_1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openh264 2.1.1 hd174df1_0 conda-forge openssl 1.1.1l h0d85af4_0 conda-forge opt-einsum 3.3.0 pypi_0 pypi packaging 20.8 pyhd3deb0d_0 conda-forge pandas 1.0.1 py38h4f17bb1_0 conda-forge pandoc 2.11.4 h35c211d_0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parso 0.8.1 pyhd8ed1ab_0 conda-forge pcre 8.44 h4a8c4bd_0 conda-forge pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.1.2 py38h83525de_0 conda-forge pip 21.3.1 pypi_0 pypi prometheus_client 0.9.0 pyhd3deb0d_0 conda-forge prompt-toolkit 3.0.11 pyha770c72_0 conda-forge prompt_toolkit 3.0.11 hd8ed1ab_0 conda-forge psutil 5.8.0 pypi_0 pypi ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pycosat 0.6.3 py38h1de35cc_1
pycparser 2.20 py_2
pygments 2.7.4 pyhd8ed1ab_0 conda-forge pyopenssl 19.1.0 pyhd3eb1b0_1
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyqt 5.9.2 py38h655552a_2
pyrsistent 0.17.3 py38h5406a74_2 conda-forge pysocks 1.7.1 py38_1
python 3.8.5 h26836e1_1
python-dateutil 2.8.1 py_0 conda-forge python.app 2 py38_10
python_abi 3.8 1_cp38 conda-forge pytz 2020.5 pyhd8ed1ab_0 conda-forge pyzmq 20.0.0 py38h23ab428_1
qt 5.9.7 h8cf7e54_3 conda-forge qtconsole 5.0.2 pyhd8ed1ab_0 conda-forge qtpy 1.9.0 py_0 conda-forge readline 8.0 h1de35cc_0
requests 2.24.0 py_0
ruamel_yaml 0.15.87 py38haf1e3a3_1
s3transfer 0.3.4 pypi_0 pypi scipy 1.7.3 pypi_0 pypi selenium 3.141.0 pypi_0 pypi selenium-wire 3.0.5 pypi_0 pypi send2trash 1.5.0 py_0 conda-forge setuptools 50.3.1 py38hecd8cb5_1
sip 4.19.8 py38h0a44026_0
six 1.15.0 py38hecd8cb5_0
sqlite 3.33.0 hffcf06c_0
stem 1.8.0 pypi_0 pypi tbselenium 0.5.3 pypi_0 pypi terminado 0.9.2 py38h50d1736_0 conda-forge testpath 0.4.4 py_0 conda-forge tk 8.6.10 hb0a8c7a_0
toripchanger 1.1.4 pypi_0 pypi tornado 6.1 py38h5406a74_1 conda-forge tqdm 4.51.0 pyhd3eb1b0_0
traitlets 5.0.5 py_0 conda-forge typing-extensions 4.0.1 pypi_0 pypi urllib3 1.25.11 py_0
wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.35.1 pyhd3eb1b0_0
widgetsnbextension 3.5.1 py38h50d1736_4 conda-forge x264 1!157.20191217 h1de35cc_0
xz 5.2.5 h1de35cc_0
yaml 0.2.5 haf1e3a3_0
zeromq 4.3.3 hb1e8313_3
zipp 3.4.0 py_0 conda-forge zlib 1.2.11 h1de35cc_3
zstd 1.4.5 h41d2c2f_0

Hardware

MacBook Pro (2021) Apple M1 Max 32 GB memory

FabioLuporini commented 2 years ago

This is a segfault due to an out-of-bounds (OOB) access to psave

The fact that you'll go OOB with the given arguments should have been detected in the Python layer by Devito, and a suitable exception should have been raised, rather than letting the kernel die in that horrible way once in C-land. However we aren't unfortunately doing this kind of checks, yet. This method probably needs to be overridden in ConditionalDimension (external contributions always welcome!). There should be an open issue about this.

Anyway, explanation.

Take nsnaps=54.

You then that

In [1]: psave.shape
Out[1]: (54, 100, 100)

You will save one snapshot every factor time iterations. Also, the computation starts at time_m=1 and runs until time_M=500. For confirmation:

In [3]: op.arguments(time=num_timesteps, dt=1e-2)['time_m']
Out[3]: 1

In [4]: op.arguments(time=num_timesteps, dt=1e-2)['time_M']
Out[4]: 500

note that op.apply(time=X ...) is syntactically equivalent to op.apply(time_M=X ...), and in your case X=num_timesteps=500.

And you see that 500/factor = 500/9 = 55.55...

So before reaching the end of the computation, you'll attempt to write to e.g. psave[498 / 9][...] = psave[55] ... that is an OOB access, which causes a segfault

Hope this helps.

I think I'll add an FAQ and keep the issue open until this behaviour gets properly caught in Python

FabioLuporini commented 2 years ago

Updated FAQ, see here: https://github.com/devitocodes/devito/wiki/FAQ#why-does-my-operator-kernel-die-suddenly

I'm updating this issue's title accordingly

mwcvitkovic commented 2 years ago

Understood - thanks!

mwcvitkovic commented 2 years ago

Not sure your preferences for closing issues vs. leaving them open, so feel free to close it if you want. I'm good on my end.

mloubout commented 2 years ago

If you are happy with the resolution, closing it yourself is fine. Glad your issue is answered.

FabioLuporini commented 2 years ago

Actually, I suggest to leave it open, or open a new one that describes the underlying issue, as I thought there was one already, but I can't find it, so I guess I was wrong