Closed LTMeyer closed 2 months ago
I downgraded the scipy version from 1.14.0 to 1.12.0 and the error disappeared.
The changelog of dedalus version 3.0.2, which mentions the lower version of scipy gave me the idea to downgrade the scipy version. I still don't know what causes the issue though.
Did numpy also downgrade to be <2.0? I think the Cython changes in numpy 2.0 may cause the issues you've seen.
Did numpy also downgrade to be <2.0? I think the Cython changes in numpy 2.0 may cause the issues you've seen.
Yes the downgrade of scipy also enforces the downgrade of numpy to verson 1.26.4.
Would you mind to give me some pointers to educate myself and understand better the origin in the error? How do the Cython changes in numpy and scipy result in the buffer error?
Is this error to be fixed in Dedalus or is it dependant on a fix from scipy/numpy? If so, should the version of scipy be specified in the dependencies?
I think it's related to this. I guess it's not really Cython itself, but how numpy interfaces with Cython.
The most up-to-date Dedalus master version on github works with the newest version of scipy. However, as far as I'm aware it's not yet updated to work with numpy 2.0, and this would require changes to Dedalus.
The most up-to-date Dedalus master version on github works with the newest version of scipy. However, as far as I'm aware it's not yet updated to work with numpy 2.0, and this would require changes to Dedalus.
From SciPy's pyproject.toml
although it works well with NumPy 1, it requires NumPy 2 by default.
I think it's related to this. I guess it's not really Cython itself, but how numpy interfaces with Cython.
If I understand correctly a fix for Dedalus to support NumPy 2 would be to update the types of the arrays in transpose.pyx
.
Changing the types of the types of the arrays seems indeed to remove the error. However if all the arrays are declared as long
, it may break backward compatibility with NumPy 1.
Just to add a little to this, I've tried a few things and think I have isolated the source of the problem. In transposes.pyx
chunk_shape
is a tuple
with dtype=np.int64
. This eventually causes B2*ranks
on line 85 to become an int64
as well due to how numpy 2.0 now promotes data types. With numpy<2 the data promotion is different and B2*ranks
becomes an int32
(hence no error here).
Whilst setting all arrays to long
would fix the issue as you say, perhaps something like casting chunk_shapes
to int32
's at the top of transposes.pyx
is easier and maintains the original intended dtypes?
Hmm I'm a little confused because the conda-forge feedstock for Dedalus is currently successfully creating and testing builds with numpy > 2, with scipy pinned < 1.14.
The tests all pass for me too. I think this is because the problem only occurs in parallel, for example running the Rayleigh-Bénard example with four processors causes it.
The tests all pass for me too. I think this is because the problem only occurs in parallel, for example running the Rayleigh-Bénard example with four processors causes it.
I confirm the issue only occurs while running in parallel. Sequential invocation of Rayleigh-Bénard works fine. Using mpi however the example failed with the error described above.
Whilst setting all arrays to long would fix the issue as you say, perhaps something like casting chunk_shapes to int32's at the top of transposes.pyx is easier and maintains the original intended dtypes?
I think casting the problematic data to the correct data type is indeed a good idea.
Thank you both for digging in to this. I just pushed a fix in 02cdaec that I think should take care of it.
Thank you both for digging in to this. I just pushed a fix in 02cdaec that I think should take care of it.
Thank you. I've tried again with your commit and numpy 2.0.1 and scipy 1.14. There was no more issue while running the Rayleigh-Bénard example in parallel.
I'm thus closing the issue as you've fixed it.
Context
I installed Dedalus version 3.0.2 via the suggested pip command after having installed MPI and FFTW3 manually.
Installation process
```bash # OpenMPI 5.0.5 install ./configure make -j sudo make install # FFTW3 3.3.10 install ./configure CC=mpicc CXX=mpicxx F77=mpif90 MPICC=mpicc MPICXX=mpicxx --enable-shared --enable-mpi --enable-threads --enable-openmp make -j sudo make install # Install Dedalus CC=mpicc pip3 install --no-cache --no-build-isolation dedalus ```I took the
rayleigh_bernard.py
example matching the Dedalus version I have installed. I tried to run the example in parallel using the suggested command. I got the error below. Note that running without MPI but directly with Python the script terminates successfully.Error
Full Error Log
> wloc/linux: Ignoring PCI device with non-16bit domain. Pass --enable-32bits-pci-domain to configure to support such devices (warning: it would break the library ABI, don't enable unless really needed). > PMIx was unable to find a usable compression library on the system. We will therefore be unable to compress large data streams. This may result in longer-than-normal startup times and larger memory footprints. We will continue, but strongly recommend installing zlib or a comparable compression library for better user experience. > You can suppress this warning by adding "pcompress_base_silence_warning=1" to your PMIx MCA default parameter file, or by adding "PMIX_MCA_pcompress_base_silence_warning=1" to your environment. > 2024-08-08 15:56:13,930 subsystems 0/4 INFO :: Building subproblem matrices 1/32 (~3%) Elapsed: 0s, Remaining: 1s, Rate: 2.4e+01/s 2024-08-08 15:56:13,988 subsystems 0/4 INFO :: Building subproblem matrices 4/32 (~12%) Elapsed: 0s, Remaining: 1s, Rate: 4.0e+01/s 2024-08-08 15:56:14,065 subsystems 0/4 INFO :: Building subproblem matrices 8/32 (~25%) Elapsed: 0s, Remaining: 1s, Rate: 4.5e+01/s 2024-08-08 15:56:14,142 subsystems 0/4 INFO :: Building subproblem matrices 12/32 (~38%) Elapsed: 0s, Remaining: 0s, Rate: 4.7e+01/s 2024-08-08 15:56:14,220 subsystems 0/4 INFO :: Building subproblem matrices 16/32 (~50%) Elapsed: 0s, Remaining: 0s, Rate: 4.8e+01/s 2024-08-08 15:56:14,302 subsystems 0/4 INFO :: Building subproblem matrices 20/32 (~62%) Elapsed: 0s, Remaining: 0s, Rate: 4.8e+01/s 2024-08-08 15:56:14,379 subsystems 0/4 INFO :: Building subproblem matrices 24/32 (~75%) Elapsed: 0s, Remaining: 0s, Rate: 4.9e+01/s 2024-08-08 15:56:14,457 subsystems 0/4 INFO :: Building subproblem matrices 28/32 (~88%) Elapsed: 1s, Remaining: 0s, Rate: 4.9e+01/s 2024-08-08 15:56:14,535 subsystems 0/4 INFO :: Building subproblem matrices 32/32 (~100%) Elapsed: 1s, Remaining: 0s, Rate: 4.9e+01/s 2024-08-08 15:56:14,544 __main__ 0/4 INFO :: Starting main loop 2024-08-08 15:56:14,579 __main__ 3/4 ERROR :: Exception raised, triggering end of main loop. 2024-08-08 15:56:14,579 __main__ 2/4 ERROR :: Exception raised, triggering end of main loop. 2024-08-08 15:56:14,579 __main__ 1/4 ERROR :: Exception raised, triggering end of main loop. 2024-08-08 15:56:14,579 __main__ 0/4 ERROR :: Exception raised, triggering end of main loop. 2024-08-08 15:56:14,580 solvers 0/4 INFO :: Final iteration: 0 2024-08-08 15:56:14,580 solvers 0/4 INFO :: Final sim time: 0.0 Traceback (most recent call last): File "/home/Documents/dedalus/rayleigh_benard.py", line 122, inHow can I fix this error? I am wondering whether it is an incorrect installation of the libraries (either OpenMPI or FFTW3 have been compiled with missing or improper options).