Open clarkpede opened 7 years ago
I will take a look, but it may be a day or two until I can get you a proper response. Please let me know if you find out anything new in the meantime.
On Dec 13, 2016 10:26 AM, "clarkpede" notifications@github.com wrote:
I have encountered a problem writing large array using a single processor. I'm running a serial code that's working with a large array. When I work with smaller arrays (64x64x64, for example) the following example works fine. My .h5 files contain 1e-6 in every position, like it should. But when I bump up the size, my .h5 file output just contains 0's.
Here's my minimal working example:
program ESIO_test use, intrinsic :: iso_c_binding use mpi use esio implicit none
integer :: myrank, nprocs, ierr real(C_DOUBLE) :: Udata(2,1024,1024,512,3) type(esio_handle) :: h call mpi_init(ierr) call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr) call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr) call esio_handle_initialize(h, MPI_COMM_WORLD) call esio_file_create(h,"/work/04114/clarkp/lonestar/fields/512/PS/restart00000000.h5",.true.) Udata = 1e-6 call esio_field_establish(h, 1024, 1, 1024, 1024, 1, 1024, 512, 1, 512, ierr) call esio_field_writev_double(h, "u", Udata(:,:,:,:,1), 2) call esio_field_writev_double(h, "v", Udata(:,:,:,:,2), 2) call esio_field_writev_double(h, "w", Udata(:,:,:,:,3), 2) call mpi_barrier(MPI_COMM_WORLD, ierr) call esio_file_close(h) call esio_handle_finalize(h) call mpi_finalize(ierr)
end program ESIO_test
I also modified this example to check the "ierr" flags at each step, but they remained 0.
Yes, I know that ESIO is really meant for parallel reading/writing and yes, I know that parallelizing the rest of my code would fix the problem. But up until now, the serial portion of the code has worked fine, and even with large arrays it only takes a minute to run. While switching to generic hdf5 library calls and/or making the code parallel might be better, both would require time to rewrite code and/or extra code complexity. I'd prefer to use a serial code if I can get away with it.
System Information: My compiler is ifort (IFORT) 16.0.1 20151021 and I'm using the -fopenmp flag. I'm using the release branch 0.1.9 of ESIO. I'm using Cray mpich 7.3.0 I'm running this on an interactive session on Lonestar 5 at TACC, with 1 node and 16 tasks allocated. I'm only running the above example with 1 MPI task.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNqyrF3Sa3a7qTqg7x1yD7l_ief2ohks5rHrkJgaJpZM4LL25S .
Any change in behavior if you try...
a) Adding the TARGET attribute to Udata? b) Break Udata into Udata, Vdata, and Wdata (therefore dropping the last "3" dimension)? c) Write scalar-valued data instead of 2-vectors (therefore dropping the first "2" dimension)?
Hunch (a) is that somehow you're spilling into a different memory layout based on the size of the array. Because there's no TARGET attribute I think the compiler is free to do whatever it wants. Hunch (b) and (c) are just wild guesses about funkiness in the dope vector or trying to reduce the problem to a smaller test case.
Let me know what you find, Rhys
On Tue, Dec 13, 2016 at 6:28 PM, Rhys Ulerich rhys.ulerich@gmail.com wrote:
I will take a look, but it may be a day or two until I can get you a proper response. Please let me know if you find out anything new in the meantime.
- Rhys
On Dec 13, 2016 10:26 AM, "clarkpede" notifications@github.com wrote:
I have encountered a problem writing large array using a single processor. I'm running a serial code that's working with a large array. When I work with smaller arrays (64x64x64, for example) the following example works fine. My .h5 files contain 1e-6 in every position, like it should. But when I bump up the size, my .h5 file output just contains 0's.
Here's my minimal working example:
program ESIO_test use, intrinsic :: iso_c_binding use mpi use esio implicit none
integer :: myrank, nprocs, ierr real(C_DOUBLE) :: Udata(2,1024,1024,512,3) type(esio_handle) :: h call mpi_init(ierr) call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr) call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr) call esio_handle_initialize(h, MPI_COMM_WORLD) call esio_file_create(h,"/work/04114/clarkp/lonestar/fields/512/PS/restart00000000.h5",.true.) Udata = 1e-6 call esio_field_establish(h, 1024, 1, 1024, 1024, 1, 1024, 512, 1, 512, ierr) call esio_field_writev_double(h, "u", Udata(:,:,:,:,1), 2) call esio_field_writev_double(h, "v", Udata(:,:,:,:,2), 2) call esio_field_writev_double(h, "w", Udata(:,:,:,:,3), 2) call mpi_barrier(MPI_COMM_WORLD, ierr) call esio_file_close(h) call esio_handle_finalize(h) call mpi_finalize(ierr)
end program ESIO_test
I also modified this example to check the "ierr" flags at each step, but they remained 0.
Yes, I know that ESIO is really meant for parallel reading/writing and yes, I know that parallelizing the rest of my code would fix the problem. But up until now, the serial portion of the code has worked fine, and even with large arrays it only takes a minute to run. While switching to generic hdf5 library calls and/or making the code parallel might be better, both would require time to rewrite code and/or extra code complexity. I'd prefer to use a serial code if I can get away with it.
System Information: My compiler is ifort (IFORT) 16.0.1 20151021 and I'm using the -fopenmp flag. I'm using the release branch 0.1.9 of ESIO. I'm using Cray mpich 7.3.0 I'm running this on an interactive session on Lonestar 5 at TACC, with 1 node and 16 tasks allocated. I'm only running the above example with 1 MPI task.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNqyrF3Sa3a7qTqg7x1yD7l_ief2ohks5rHrkJgaJpZM4LL25S .
There's no change if I apply a, b, or c. Sorry.
After some experimentation, I've found that this happens when I cross the threshold from 512x512x512 to 1024x1024x512. Therefore, if I break down the array into smaller blocks (such as 512x512x512 blocks) and write them individually, the code works.
Any chance you can isolate the behavier to the particular compiler you are using? That edge in sizes is bizarre.
On Dec 15, 2016 9:24 AM, "clarkpede" notifications@github.com wrote:
There's no change if I apply a, b, or c. Sorry.
After some experimentation, I've found that this happens when I cross the threshold from 512x512x512 to 1024x1024x512. Therefore, if I break down the array into smaller blocks (such as 512x512x512 blocks) and write them individually, the code works.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3#issuecomment-267339371, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNq9ckA8MWzNLt02sE9xA6BAJjuTyYks5rIU2mgaJpZM4LL25S .
I just tried it with gcc 4.9.3 and cray_mpich 7.3.0. I got the exact same result. ESIO stores all zeros for arrays that are 1024x1024x512, but stores the arrays properly for arrays that are 64x64x512.
I've also tried using the development branch and releases 0.1.7 and 0.1.9 (all with the intel compiler). This problem doesn't appear to be version-dependent.
Thanks. I will see what I can do. May be a few days.
On Dec 16, 2016 8:17 AM, "clarkpede" notifications@github.com wrote:
I just tried it with gcc 4.9.3 and cray_mpich 7.3.0. I got the exact same result. ESIO stores all zeros for arrays that are 1024x1024x512, but stores the arrays properly for arrays that are 64x64x512.
I've also tried using the development branch and releases 0.1.7 and 0.1.9 (all with the intel compiler). This problem doesn't appear to be version-dependent.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3#issuecomment-267591974, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNqzfP_YSaGGboKb0fRpZ26hR0y_otks5rIo9PgaJpZM4LL25S .
Ok. I can work around this issue by splitting the third index (the 512 in the examples) into suitably small chunks, and the speed is only slightly slower when I do that. So there's not really a rush.
Any updates on this issue?
No news here. Can you reproduce with some MPI besides cray_mpich 7.3.0?
I tested a modified example script on my desktop. A 512x512x512 array works fine, but a 1024x512x512 gave the following error message:
HDF5-DIAG: Error detected in HDF5 (1.8.15-patch1) MPI-process 0:
#000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data
major: Dataset
minor: Write failed
#001: H5Dio.c line 352 in H5D__pre_write(): can't write data
major: Dataset
minor: Write failed
#002: H5Dio.c line 789 in H5D__write(): can't write data
major: Dataset
minor: Write failed
#003: H5Dmpio.c line 529 in H5D__contig_collective_write(): couldn't finish shared collective MPI-IO
major: Low-level I/O
minor: Write failed
#004: H5Dmpio.c line 1399 in H5D__inter_collective_io(): couldn't finish collective MPI-IO
major: Low-level I/O
minor: Can't get value
#005: H5Dmpio.c line 1443 in H5D__final_collective_io(): optimized write failed
major: Dataset
minor: Write failed
#006: H5Dmpio.c line 297 in H5D__mpio_select_write(): can't finish collective parallel write
major: Low-level I/O
minor: Write failed
#007: H5Fio.c line 171 in H5F_block_write(): write through metadata accumulator failed
major: Low-level I/O
minor: Write failed
#008: H5Faccum.c line 825 in H5F__accum_write(): file write failed
major: Low-level I/O
minor: Write failed
#009: H5FDint.c line 256 in H5FD_write(): addr overflow, addr = 2144, size=18446744071562067968, eoa=2147485792
major: Invalid arguments to routine
minor: Address overflowed
esio: x-layout0.c:93: ERROR: Operation failed
Default esio error handler invoked.
I got the same error with Intel 16.0.0 compilers and GCC 5.2.0 compilers. I also tested both MPICH2 3.1.4 and OpenMPI 1.10.0.
The modified Fortran program is:
program ESIO_test
use, intrinsic :: iso_c_binding
use mpi
use esio
implicit none
integer :: ierr
real(C_DOUBLE) :: Udata(1024,512,512)
type(esio_handle) :: h
call mpi_init(ierr)
call esio_handle_initialize(h, MPI_COMM_WORLD)
call esio_file_create(h,"output.h5", .true.)
Udata = 1e-6
call esio_field_establish(h, 1024, 1, 1024, 512, 1, 512, 512, 1, 512, ierr)
call esio_field_writev_double(h, "u", Udata(:,:,:), 1)
call mpi_barrier(MPI_COMM_WORLD, ierr)
call esio_file_close(h)
call esio_handle_finalize(h)
call mpi_finalize(ierr)
end program ESIO_test
Have you been able to reproduce any of these problems yourself?
Additional information:
output.h5
file with pure 0's. That's even with the compiler options -check all -fp-stack-check -traceback
on.I am able to compile/run your recreate against the develop branch. I also see...
#009: H5FDint.c line 256 in H5FD_write(): addr overflow, addr = 2144, size=18446744071562067968, eoa=2147485792
major: Invalid arguments to routine
minor: Address overflowed
...but see sensible data coming in on the stack...
Thread 1 "a.out" hit Breakpoint 1, esio_field_layout0_field_writer (plist_id=167772176,
dset_id=83886080, field=0x555555756040 <udata>, cglobal=512, cstart=0, clocal=512,
cstride=524288, bglobal=512, bstart=0, blocal=512, bstride=1024, aglobal=1024, astart=0,
alocal=1024, astride=1, type_id=50331741) at ../../esio/esio/x-layout0.c:43
...by which I mean the strides/sizes all seem to check out. At the entry to OPFUNC
at x-layout0.c:88 I see...
(gdb) info local
nelems = 268435456
lies = 1
memspace = 67108866
filespace = 67108867
start = {0, 0, 0}
count = {512, 512, 1024}
status = 21845
...which feels sane as status
has yet to be overwritten and 268435456 == 512*512*1025
. I'm having a miserable time getting a breakpoint on H5Dwrite
(i.e. OPFUNC) with source information available. Based upon
#009: H5FDint.c line 254 in H5FD_write(): addr overflow, addr = 2144, size=18446744071562067968, eoa=2147485792
I think the trick will be understanding why that absurd size=184467....
is appearing.
For posterity my setup:
$ ldd a.out
linux-vdso.so.1 => (0x00007ffe2aa80000)
libesiof-0.2.0.so => /home/rhys/lib/libesiof-0.2.0.so (0x00002b2d034e4000)
libmpi_mpifh.so.20 => /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so.20 (0x00002b2d03749000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b2d039a0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b2d03cd1000)
libhdf5_hl.so.10 => /home/rhys/lib/libhdf5_hl.so.10 (0x00002b2d04098000)
libhdf5.so.10 => /home/rhys/lib/libhdf5.so.10 (0x00002b2d042bc000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b2d048aa000)
libmpi.so.20 => /usr/lib/x86_64-linux-gnu/libmpi.so.20 (0x00002b2d04bb3000)
libopen-pal.so.20 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.20 (0x00002b2d04ea2000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b2d0514f000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b2d0536d000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b2d055af000)
/lib64/ld-linux-x86-64.so.2 (0x000056492a0ff000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00002b2d057c6000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00002b2d059e2000)
libopen-rte.so.20 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.20 (0x00002b2d05be6000)
libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00002b2d05e6f000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00002b2d060aa000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00002b2d062b4000)
libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00002b2d064b7000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00002b2d066c2000)
Realistically, I'm not going to have time to track this down. I'm sorry. Valgrind shows the ESIO layer clean at 512x512x512. Where is Valgrind complaining about things? This smells fishy at the HDF5 level.
One thing I did not check was that the values passed in from Fortran are arriving at esio_field_establish
correctly on the C-side. I'll will peek at that when I'm able.
esio_field_establish
looks sane as far as I can tell.
Have you been able to confirm/deny that the sides/parameters going into HDF5 are sane on your install?
No, I haven't been able to confirm that.
I have encountered a problem writing large array using a single processor. I'm running a serial code that's working with a large array. When I work with smaller arrays (64x64x64, for example) the following example works fine. My .h5 files contain 1e-6 in every position, like it should. But when I bump up the size, my .h5 file output just contains 0's.
Here's my minimal working example:
I also modified this example to check the "ierr" flags at each step, but they remained 0.
Yes, I know that ESIO is really meant for parallel reading/writing and yes, I know that parallelizing the rest of my code would fix the problem. But up until now, the serial portion of the code has worked fine, and even with large arrays it only takes a minute to run. While switching to generic hdf5 library calls and/or making the code parallel might be better, both would require time to rewrite code and/or extra code complexity. I'd prefer to use a serial code if I can get away with it.
System Information: My compiler is ifort (IFORT) 16.0.1 20151021 and I'm using the -fopenmp flag. I'm using the release branch 0.1.9 of ESIO. I'm using Cray mpich 7.3.0 I'm running this on an interactive session on Lonestar 5 at TACC, with 1 node and 16 tasks allocated. I'm only running the above example with 1 MPI task.