RhysU / ESIO

The ExaScale IO (ESIO) library provides simple, high throughput input and output of structured data sets using parallel HDF5. ESIO is designed to support reading and writing turbulence simulation restart files within C, C++, and modern Fortran applications.
https://rhysu.github.io/ESIO/
GNU Lesser General Public License v2.1
11 stars 2 forks source link

Issue Writing Large Files #3

Open clarkpede opened 7 years ago

clarkpede commented 7 years ago

I have encountered a problem writing large array using a single processor. I'm running a serial code that's working with a large array. When I work with smaller arrays (64x64x64, for example) the following example works fine. My .h5 files contain 1e-6 in every position, like it should. But when I bump up the size, my .h5 file output just contains 0's.

Here's my minimal working example:

program ESIO_test
    use, intrinsic :: iso_c_binding
    use mpi
    use esio
    implicit none

    integer :: myrank, nprocs, ierr
    real(C_DOUBLE) :: Udata(2,1024,1024,512,3)
    type(esio_handle) :: h

    call mpi_init(ierr)
    call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr)
    call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr)

    call esio_handle_initialize(h, MPI_COMM_WORLD)
    call esio_file_create(h,"/work/04114/clarkp/lonestar/fields/512/PS/restart00000000.h5",.true.)

    Udata = 1e-6
    call esio_field_establish(h, 1024, 1, 1024, 1024, 1, 1024, 512, 1, 512, ierr)
    call esio_field_writev_double(h, "u", Udata(:,:,:,:,1), 2)
    call esio_field_writev_double(h, "v", Udata(:,:,:,:,2), 2)
    call esio_field_writev_double(h, "w", Udata(:,:,:,:,3), 2)
    call mpi_barrier(MPI_COMM_WORLD, ierr)

    call esio_file_close(h)
    call esio_handle_finalize(h)

    call mpi_finalize(ierr)

end program ESIO_test

I also modified this example to check the "ierr" flags at each step, but they remained 0.

Yes, I know that ESIO is really meant for parallel reading/writing and yes, I know that parallelizing the rest of my code would fix the problem. But up until now, the serial portion of the code has worked fine, and even with large arrays it only takes a minute to run. While switching to generic hdf5 library calls and/or making the code parallel might be better, both would require time to rewrite code and/or extra code complexity. I'd prefer to use a serial code if I can get away with it.

System Information: My compiler is ifort (IFORT) 16.0.1 20151021 and I'm using the -fopenmp flag. I'm using the release branch 0.1.9 of ESIO. I'm using Cray mpich 7.3.0 I'm running this on an interactive session on Lonestar 5 at TACC, with 1 node and 16 tasks allocated. I'm only running the above example with 1 MPI task.

RhysU commented 7 years ago

I will take a look, but it may be a day or two until I can get you a proper response. Please let me know if you find out anything new in the meantime.

On Dec 13, 2016 10:26 AM, "clarkpede" notifications@github.com wrote:

I have encountered a problem writing large array using a single processor. I'm running a serial code that's working with a large array. When I work with smaller arrays (64x64x64, for example) the following example works fine. My .h5 files contain 1e-6 in every position, like it should. But when I bump up the size, my .h5 file output just contains 0's.

Here's my minimal working example:

program ESIO_test use, intrinsic :: iso_c_binding use mpi use esio implicit none

integer :: myrank, nprocs, ierr
real(C_DOUBLE) :: Udata(2,1024,1024,512,3)
type(esio_handle) :: h

call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr)
call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr)

call esio_handle_initialize(h, MPI_COMM_WORLD)
call esio_file_create(h,"/work/04114/clarkp/lonestar/fields/512/PS/restart00000000.h5",.true.)

Udata = 1e-6
call esio_field_establish(h, 1024, 1, 1024, 1024, 1, 1024, 512, 1, 512, ierr)
call esio_field_writev_double(h, "u", Udata(:,:,:,:,1), 2)
call esio_field_writev_double(h, "v", Udata(:,:,:,:,2), 2)
call esio_field_writev_double(h, "w", Udata(:,:,:,:,3), 2)
call mpi_barrier(MPI_COMM_WORLD, ierr)

call esio_file_close(h)
call esio_handle_finalize(h)

call mpi_finalize(ierr)

end program ESIO_test

I also modified this example to check the "ierr" flags at each step, but they remained 0.

Yes, I know that ESIO is really meant for parallel reading/writing and yes, I know that parallelizing the rest of my code would fix the problem. But up until now, the serial portion of the code has worked fine, and even with large arrays it only takes a minute to run. While switching to generic hdf5 library calls and/or making the code parallel might be better, both would require time to rewrite code and/or extra code complexity. I'd prefer to use a serial code if I can get away with it.

System Information: My compiler is ifort (IFORT) 16.0.1 20151021 and I'm using the -fopenmp flag. I'm using the release branch 0.1.9 of ESIO. I'm using Cray mpich 7.3.0 I'm running this on an interactive session on Lonestar 5 at TACC, with 1 node and 16 tasks allocated. I'm only running the above example with 1 MPI task.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNqyrF3Sa3a7qTqg7x1yD7l_ief2ohks5rHrkJgaJpZM4LL25S .

RhysU commented 7 years ago

Any change in behavior if you try...

a) Adding the TARGET attribute to Udata? b) Break Udata into Udata, Vdata, and Wdata (therefore dropping the last "3" dimension)? c) Write scalar-valued data instead of 2-vectors (therefore dropping the first "2" dimension)?

Hunch (a) is that somehow you're spilling into a different memory layout based on the size of the array. Because there's no TARGET attribute I think the compiler is free to do whatever it wants. Hunch (b) and (c) are just wild guesses about funkiness in the dope vector or trying to reduce the problem to a smaller test case.

Let me know what you find, Rhys

On Tue, Dec 13, 2016 at 6:28 PM, Rhys Ulerich rhys.ulerich@gmail.com wrote:

I will take a look, but it may be a day or two until I can get you a proper response. Please let me know if you find out anything new in the meantime.

  • Rhys

On Dec 13, 2016 10:26 AM, "clarkpede" notifications@github.com wrote:

I have encountered a problem writing large array using a single processor. I'm running a serial code that's working with a large array. When I work with smaller arrays (64x64x64, for example) the following example works fine. My .h5 files contain 1e-6 in every position, like it should. But when I bump up the size, my .h5 file output just contains 0's.

Here's my minimal working example:

program ESIO_test use, intrinsic :: iso_c_binding use mpi use esio implicit none

integer :: myrank, nprocs, ierr
real(C_DOUBLE) :: Udata(2,1024,1024,512,3)
type(esio_handle) :: h

call mpi_init(ierr)
call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr)
call mpi_comm_size(MPI_COMM_WORLD, nprocs, ierr)

call esio_handle_initialize(h, MPI_COMM_WORLD)
call esio_file_create(h,"/work/04114/clarkp/lonestar/fields/512/PS/restart00000000.h5",.true.)

Udata = 1e-6
call esio_field_establish(h, 1024, 1, 1024, 1024, 1, 1024, 512, 1, 512, ierr)
call esio_field_writev_double(h, "u", Udata(:,:,:,:,1), 2)
call esio_field_writev_double(h, "v", Udata(:,:,:,:,2), 2)
call esio_field_writev_double(h, "w", Udata(:,:,:,:,3), 2)
call mpi_barrier(MPI_COMM_WORLD, ierr)

call esio_file_close(h)
call esio_handle_finalize(h)

call mpi_finalize(ierr)

end program ESIO_test

I also modified this example to check the "ierr" flags at each step, but they remained 0.

Yes, I know that ESIO is really meant for parallel reading/writing and yes, I know that parallelizing the rest of my code would fix the problem. But up until now, the serial portion of the code has worked fine, and even with large arrays it only takes a minute to run. While switching to generic hdf5 library calls and/or making the code parallel might be better, both would require time to rewrite code and/or extra code complexity. I'd prefer to use a serial code if I can get away with it.

System Information: My compiler is ifort (IFORT) 16.0.1 20151021 and I'm using the -fopenmp flag. I'm using the release branch 0.1.9 of ESIO. I'm using Cray mpich 7.3.0 I'm running this on an interactive session on Lonestar 5 at TACC, with 1 node and 16 tasks allocated. I'm only running the above example with 1 MPI task.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNqyrF3Sa3a7qTqg7x1yD7l_ief2ohks5rHrkJgaJpZM4LL25S .

clarkpede commented 7 years ago

There's no change if I apply a, b, or c. Sorry.

After some experimentation, I've found that this happens when I cross the threshold from 512x512x512 to 1024x1024x512. Therefore, if I break down the array into smaller blocks (such as 512x512x512 blocks) and write them individually, the code works.

RhysU commented 7 years ago

Any chance you can isolate the behavier to the particular compiler you are using? That edge in sizes is bizarre.

On Dec 15, 2016 9:24 AM, "clarkpede" notifications@github.com wrote:

There's no change if I apply a, b, or c. Sorry.

After some experimentation, I've found that this happens when I cross the threshold from 512x512x512 to 1024x1024x512. Therefore, if I break down the array into smaller blocks (such as 512x512x512 blocks) and write them individually, the code works.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3#issuecomment-267339371, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNq9ckA8MWzNLt02sE9xA6BAJjuTyYks5rIU2mgaJpZM4LL25S .

clarkpede commented 7 years ago

I just tried it with gcc 4.9.3 and cray_mpich 7.3.0. I got the exact same result. ESIO stores all zeros for arrays that are 1024x1024x512, but stores the arrays properly for arrays that are 64x64x512.

I've also tried using the development branch and releases 0.1.7 and 0.1.9 (all with the intel compiler). This problem doesn't appear to be version-dependent.

RhysU commented 7 years ago

Thanks. I will see what I can do. May be a few days.

On Dec 16, 2016 8:17 AM, "clarkpede" notifications@github.com wrote:

I just tried it with gcc 4.9.3 and cray_mpich 7.3.0. I got the exact same result. ESIO stores all zeros for arrays that are 1024x1024x512, but stores the arrays properly for arrays that are 64x64x512.

I've also tried using the development branch and releases 0.1.7 and 0.1.9 (all with the intel compiler). This problem doesn't appear to be version-dependent.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/RhysU/ESIO/issues/3#issuecomment-267591974, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFNqzfP_YSaGGboKb0fRpZ26hR0y_otks5rIo9PgaJpZM4LL25S .

clarkpede commented 7 years ago

Ok. I can work around this issue by splitting the third index (the 512 in the examples) into suitably small chunks, and the speed is only slightly slower when I do that. So there's not really a rush.

clarkpede commented 7 years ago

Any updates on this issue?

RhysU commented 7 years ago

No news here. Can you reproduce with some MPI besides cray_mpich 7.3.0?

clarkpede commented 7 years ago

I tested a modified example script on my desktop. A 512x512x512 array works fine, but a 1024x512x512 gave the following error message:

HDF5-DIAG: Error detected in HDF5 (1.8.15-patch1) MPI-process 0:
  #000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data
    major: Dataset
    minor: Write failed
  #001: H5Dio.c line 352 in H5D__pre_write(): can't write data
    major: Dataset
    minor: Write failed
  #002: H5Dio.c line 789 in H5D__write(): can't write data
    major: Dataset
    minor: Write failed
  #003: H5Dmpio.c line 529 in H5D__contig_collective_write(): couldn't finish shared collective MPI-IO
    major: Low-level I/O
    minor: Write failed
  #004: H5Dmpio.c line 1399 in H5D__inter_collective_io(): couldn't finish collective MPI-IO
    major: Low-level I/O
    minor: Can't get value
  #005: H5Dmpio.c line 1443 in H5D__final_collective_io(): optimized write failed
    major: Dataset
    minor: Write failed
  #006: H5Dmpio.c line 297 in H5D__mpio_select_write(): can't finish collective parallel write
    major: Low-level I/O
    minor: Write failed
  #007: H5Fio.c line 171 in H5F_block_write(): write through metadata accumulator failed
    major: Low-level I/O
    minor: Write failed
  #008: H5Faccum.c line 825 in H5F__accum_write(): file write failed
    major: Low-level I/O
    minor: Write failed
  #009: H5FDint.c line 256 in H5FD_write(): addr overflow, addr = 2144, size=18446744071562067968, eoa=2147485792
    major: Invalid arguments to routine
    minor: Address overflowed
esio: x-layout0.c:93: ERROR: Operation failed
Default esio error handler invoked.

I got the same error with Intel 16.0.0 compilers and GCC 5.2.0 compilers. I also tested both MPICH2 3.1.4 and OpenMPI 1.10.0.

The modified Fortran program is:

program ESIO_test
    use, intrinsic :: iso_c_binding
    use mpi
    use esio
    implicit none

    integer :: ierr
    real(C_DOUBLE) :: Udata(1024,512,512)
    type(esio_handle) :: h

    call mpi_init(ierr)

    call esio_handle_initialize(h, MPI_COMM_WORLD)
    call esio_file_create(h,"output.h5", .true.)

    Udata = 1e-6
    call esio_field_establish(h, 1024, 1, 1024, 512, 1, 512, 512, 1, 512, ierr)
    call esio_field_writev_double(h, "u", Udata(:,:,:), 1)
    call mpi_barrier(MPI_COMM_WORLD, ierr)

    call esio_file_close(h)
    call esio_handle_finalize(h)

    call mpi_finalize(ierr)

end program ESIO_test

Have you been able to reproduce any of these problems yourself?

clarkpede commented 7 years ago

Additional information:

  1. If I bump the array size up 1024x1024x1024 the code runs quietly, reproducing the exact problem I was having before. No errors, just an output.h5 file with pure 0's. That's even with the compiler options -check all -fp-stack-check -traceback on.
  2. Valgrind gives me a lot of "Conditional jump or move depends on uninitialized value(s)" messages when I try to run it against anything bigger than 512x512x512. You might want to check it out.
RhysU commented 7 years ago

I am able to compile/run your recreate against the develop branch. I also see...

  #009: H5FDint.c line 256 in H5FD_write(): addr overflow, addr = 2144, size=18446744071562067968, eoa=2147485792
    major: Invalid arguments to routine
    minor: Address overflowed

...but see sensible data coming in on the stack...

Thread 1 "a.out" hit Breakpoint 1, esio_field_layout0_field_writer (plist_id=167772176, 
    dset_id=83886080, field=0x555555756040 <udata>, cglobal=512, cstart=0, clocal=512, 
    cstride=524288, bglobal=512, bstart=0, blocal=512, bstride=1024, aglobal=1024, astart=0, 
    alocal=1024, astride=1, type_id=50331741) at ../../esio/esio/x-layout0.c:43

...by which I mean the strides/sizes all seem to check out. At the entry to OPFUNC at x-layout0.c:88 I see...

(gdb) info local
nelems = 268435456
lies = 1
memspace = 67108866
filespace = 67108867
start = {0, 0, 0}
count = {512, 512, 1024}
status = 21845

...which feels sane as status has yet to be overwritten and 268435456 == 512*512*1025. I'm having a miserable time getting a breakpoint on H5Dwrite (i.e. OPFUNC) with source information available. Based upon

#009: H5FDint.c line 254 in H5FD_write(): addr overflow, addr = 2144, size=18446744071562067968, eoa=2147485792

I think the trick will be understanding why that absurd size=184467.... is appearing.

For posterity my setup:

$ ldd a.out 
    linux-vdso.so.1 =>  (0x00007ffe2aa80000)
    libesiof-0.2.0.so => /home/rhys/lib/libesiof-0.2.0.so (0x00002b2d034e4000)
    libmpi_mpifh.so.20 => /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so.20 (0x00002b2d03749000)
    libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00002b2d039a0000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b2d03cd1000)
    libhdf5_hl.so.10 => /home/rhys/lib/libhdf5_hl.so.10 (0x00002b2d04098000)
    libhdf5.so.10 => /home/rhys/lib/libhdf5.so.10 (0x00002b2d042bc000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00002b2d048aa000)
    libmpi.so.20 => /usr/lib/x86_64-linux-gnu/libmpi.so.20 (0x00002b2d04bb3000)
    libopen-pal.so.20 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.20 (0x00002b2d04ea2000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b2d0514f000)
    libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00002b2d0536d000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00002b2d055af000)
    /lib64/ld-linux-x86-64.so.2 (0x000056492a0ff000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00002b2d057c6000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00002b2d059e2000)
    libopen-rte.so.20 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.20 (0x00002b2d05be6000)
    libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00002b2d05e6f000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00002b2d060aa000)
    libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00002b2d062b4000)
    libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00002b2d064b7000)
    libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00002b2d066c2000)
RhysU commented 7 years ago

Realistically, I'm not going to have time to track this down. I'm sorry. Valgrind shows the ESIO layer clean at 512x512x512. Where is Valgrind complaining about things? This smells fishy at the HDF5 level.

RhysU commented 7 years ago

One thing I did not check was that the values passed in from Fortran are arriving at esio_field_establish correctly on the C-side. I'll will peek at that when I'm able.

RhysU commented 7 years ago

esio_field_establish looks sane as far as I can tell.

RhysU commented 7 years ago

Have you been able to confirm/deny that the sides/parameters going into HDF5 are sane on your install?

clarkpede commented 7 years ago

No, I haven't been able to confirm that.