Closed Paklgit closed 5 months ago
can you clarify what you mean by "The segmentation fault does not occur when the line 191 (v[start_for_rank+steps[step]:start_for_rank+steps[step+1],:] = values[steps[step]:steps[step+1],:]) where the values are written into the netCDF variable is passed instead."?
I mean to use literally pass
inside of the loop instead. I just wanted to make clear that the fault is happening inside of this line of my script and not elsewhere.
netcdf4-python (and netcdf-c) uses size_t
for array indices. I wonder if size_t is 32 bits on your platform?
Well, this could be possible, but it seems strange to me that there is a different behavior when using a different number of mpi processes. Nonetheless it still might be an issue on my platform/setup. The output is performed for testing purposes, so it is noncritical as it is not a common use case.
I could reproduce the error in C, same behavior. The segmentation fault only occurs for certain process numbers. size_t
is stored in 64 bits on my system. Whatever is happening, it is not about netcdf4-python though.
ok thanks for confirming that @Paklgit. Closing now, but feel free to reopen if need be.
I encounter an segmentation fault when writing into a 100000x100000 variable of a file on memory (/tmp)
version: py-netcdf4-1.6.2 python: python-3.9.9 MPI: py-mpi4py-3.1.2
The code was executed via
srun -n 4 python script.py -s 1 --rows 100000 --cols 100000 -p z -m -f /file.nc
It depends on how many processes are used, e.g.-n 1
or-n 32
raise no issues.When I tried to reproduce the output in C I encountered a similar issue when creating the data. I could solve this by ensuring the datatype of the indexing variable would support large enough numbers (unsigned long or sth like that). As the code can be executed error-free with
--rows 100000 --cols 20000
(<2147483647) but not with--rows 100000 --cols 30000
(>2147483647) I suspect an integer overflow happening here.The segmentation fault does not occur when the line 191 (
v[start_for_rank+steps[step]:start_for_rank+steps[step+1],:] = values[steps[step]:steps[step+1],:]
) where the values are written into the netCDF variable is passed instead.Script: