SmileiPIC / Smilei

Particle-in-cell code for plasma simulation
https://smileipic.github.io/Smilei
334 stars 119 forks source link

Problem with IO #31

Closed dsbertini closed 6 years ago

dsbertini commented 6 years ago

Hi, I tried to run SMILEI benchmark examples with openMPI 3.0.0 and all examples doing tracks diagnostics ( DiagTrackParticles section in python script ) are crashing due to the usage of specific HDF5 routine that use internally MPI IO collective buffering i.e H5Pset_dxpl_mpio( transfer, H5FD_MPIO_COLLECTIVE); When using instead of collective the independent flag i.e H5Pset_dxpl_mpio( transfer, H5FD_MPIO_INDEPENDENT); everything works fine.

I notice the same crash with both HDF5 1.10.1 ( newest ) as well as the recommended one for SMILEI ( 1.8.16). I think it is deeply link to the new romio implementation in MPI since using older version (1.10.7) works fine.

Here is the coredump in both cases:

Running diags at time t = 0

[lxbk0341:12471] Process received signal [lxbk0341:12471] Signal: Segmentation fault (11) [lxbk0341:12471] Signal code: Address not mapped (1) [lxbk0341:12471] Failing at address: (nil) [lxbk0341:12472] Process received signal [lxbk0341:12472] Signal: Segmentation fault (11) [lxbk0341:12472] Signal code: Address not mapped (1) [lxbk0341:12472] Failing at address: (nil) [lxbk0341:12473] Process received signal [lxbk0341:12473] Signal: Segmentation fault (11) [lxbk0341:12473] Signal code: Address not mapped (1) [lxbk0341:12473] Failing at address: (nil) [lxbk0341:12471] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ffb81f31890] [lxbk0341:12471] [ 1] [lxbk0341:12472] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7fb043b13890] [lxbk0341:12472] [ 1] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x1577)[0x7fb02a9b9657] [lxbk0341:12472] [ 2] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x7fb02a9ba363] [lxbk0341:12472] [ 3] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x7fb02a9aff5d] [lxbk0341:12472] [ 4] [lxbk0341:12473] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f8e91034890] [lxbk0341:12473] [ 1] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x1577)[0x7f8e77ed8657] [lxbk0341:12473] [ 2] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x7f8e77ed9363] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x7fb02a996e06] [lxbk0341:12472] [ 5] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_file_set_view+0x83)[0x7fb02a990863] [lxbk0341:12472] [ 6] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten+0x1577)[0x7ffb68ce2657] [lxbk0341:12471] [ 2] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIOI_Flatten_datatype+0xe3)[0x7ffb68ce3363] [lxbk0341:12471] [ 3] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x7ffb68cd8f5d] [lxbk0341:12471] [ 4] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x7ffb68cbfe06] [lxbk0341:12471] [ 5] [lxbk0341:12473] [ 3] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(ADIO_Set_view+0x1fd)[0x7f8e77ecef5d] [lxbk0341:12473] [ 4] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_dist_MPI_File_set_view+0x2f6)[0x7f8e77eb5e06] [lxbk0341:12473] [ 5] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_file_set_view+0x83)[0x7f8e77eaf863] [lxbk0341:12473] [ 6] /lustre/hebe/rz/dbertini/plasma/softw/lib/libmpi.so.40(MPI_File_set_view+0xdd)[0x7f8e9047fb2d] [lxbk0341:12473] [ 7] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x30cc77)[0x7f8e91ad3c77] [lxbk0341:12473] [ 8] /lustre/hebe/rz/dbertini/plasma/softw/lib/openmpi/mca_io_romio314.so(mca_io_romio314_file_set_view+0x83)[0x7ffb68cb9863] [lxbk0341:12471] [ 6] /lustre/hebe/rz/dbertini/plasma/softw/lib/libmpi.so.40(MPI_File_set_view+0xdd)[0x7ffb8137cb2d] [lxbk0341:12471] [ 7] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x30cc77)[0x7ffb829d0c77] [lxbk0341:12471] [ 8] /lustre/hebe/rz/dbertini/plasma/softw/lib/libmpi.so.40(MPI_File_set_view+0xdd)[0x7fb042f5eb2d] [lxbk0341:12472] [ 7] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x30cc77)[0x7fb0445b2c77] [lxbk0341:12472] [ 8] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5FD_write+0xe8)[0x7f8e918d8638] [lxbk0341:12473] [ 9] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Faccum_write+0x2ec)[0x7f8e918bd9bc] [lxbk0341:12473] [10] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5FD_write+0xe8)[0x7ffb827d5638] [lxbk0341:12471] [ 9] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Faccum_write+0x2ec)[0x7ffb827ba9bc] [lxbk0341:12471] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5FD_write+0xe8)[0x7fb0443b7638] [lxbk0341:12472] [ 9] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Faccum_write+0x2ec)[0x7fb04439c9bc] [lxbk0341:12472] [10] [10] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5PB_write+0x960)[0x7fb04449b780] [lxbk0341:12472] [11] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5F_block_write+0xfb)[0x7fb0443a015b] [lxbk0341:12472] [12] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5PB_write+0x960)[0x7f8e919bc780] [lxbk0341:12473] [11] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5F_block_write+0xfb)[0x7f8e918c115b] [lxbk0341:12473] [12] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dchunk_allocate+0x1c5c)[0x7f8e9187016c] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5PB_write+0x960)[0x7ffb828b9780] [lxbk0341:12471] [11] [lxbk0341:12473] [13] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5F_block_write+0xfb)[0x7ffb827be15b] [lxbk0341:12471] [12] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dchunk_allocate+0x1c5c)[0x7ffb8276d16c] [lxbk0341:12471] [13] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0xb9910)[0x7ffb8277d910] [lxbk0341:12471] [14] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dchunk_allocate+0x1c5c)[0x7fb04434f16c] [lxbk0341:12472] [13] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0xb9910)[0x7fb04435f910] [lxbk0341:12472] [14] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dalloc_storage+0x21f)[0x7fb04436484f] [lxbk0341:12472] [15] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0xb9910)[0x7f8e91880910] [lxbk0341:12473] [14] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5D__alloc_storage+0x21f)[0x7f8e9188584f] [lxbk0341:12473] [15] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dlayout_oh_create+0x4a9)[0x7f8e9188c149] [lxbk0341:12473] [16] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dalloc_storage+0x21f)[0x7ffb8278284f] [lxbk0341:12471] [15] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dlayout_oh_create+0x4a9)[0x7fb04436b149] [lxbk0341:12472] [16] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate+0x8f6)[0x7fb044360d56] [lxbk0341:12472] [17] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0xc63dc)[0x7fb04436c3dc] [lxbk0341:12472] [18] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate+0x8f6)[0x7f8e91881d56] [lxbk0341:12473] [17] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0xc63dc)[0x7f8e9188d3dc] [lxbk0341:12473] [18] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5O_obj_create+0xa4)[0x7f8e91951c14] [lxbk0341:12473] [19] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dlayout_oh_create+0x4a9)[0x7ffb82789149] [lxbk0341:12471] [16] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate+0x8f6)[0x7ffb8277ed56] [lxbk0341:12471] [17] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0xc63dc)[0x7ffb8278a3dc] [lxbk0341:12471] [18] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5O_obj_create+0xa4)[0x7fb044430c14] [lxbk0341:12472] [19] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x173591)[0x7f8e9193a591] [lxbk0341:12473] [20] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x1466c6)[0x7f8e9190d6c6] [lxbk0341:12473] [21] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5G_traverse+0xef)[0x7f8e9190dbaf] [lxbk0341:12473] [22] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5O_obj_create+0xa4)[0x7ffb8284ec14] [lxbk0341:12471] [19] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x173591)[0x7ffb82837591] [lxbk0341:12471] [20] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x1466c6)[0x7ffb8280a6c6] [lxbk0341:12471] [21] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x173591)[0x7fb044419591] [lxbk0341:12472] [20] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(+0x1466c6)[0x7fb0443ec6c6] [lxbk0341:12472] [21] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5G_traverse+0xef)[0x7ffb8280abaf] [lxbk0341:12471] [22] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5L_link_object+0xaf)[0x7ffb82838adf] [lxbk0341:12471] [23] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate_named+0x65)[0x7ffb8277e3f5] [lxbk0341:12471] [24] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate2+0x217)[0x7ffb827599c7] [lxbk0341:12471] [25] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5L_link_object+0xaf)[0x7f8e9193badf] [lxbk0341:12473] [23] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate_named+0x65)[0x7f8e918813f5] [lxbk0341:12473] [24] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate2+0x217)[0x7f8e9185c9c7] [lxbk0341:12473] [25] smilei(_ZN15DiagnosticTrack3runEP9SmileiMPIR11VectorPatchiP9SimWindow+0xcc7)[0x475467] [lxbk0341:12473] [26] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5G_traverse+0xef)[0x7fb0443ecbaf] [lxbk0341:12472] [22] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5L_link_object+0xaf)[0x7fb04441aadf] [lxbk0341:12472] [23] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate_named+0x65)[0x7fb0443603f5] [lxbk0341:12472] [24] /lustre/hebe/rz/dbertini/plasma/softw/lib/libhdf5.so.101(H5Dcreate2+0x217)[0x7fb04433b9c7] [lxbk0341:12472] [25] smilei(_ZN11VectorPatch11runAllDiagsER6ParamsP9SmileiMPIjR6TimersP9SimWindow+0x205)[0x4ff505] [lxbk0341:12473] [27] smilei(main+0x17bf)[0x434faf] [lxbk0341:12473] [28] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f8e8f8a2b45] [lxbk0341:12473] [29] smilei(_ZN15DiagnosticTrack3runEP9SmileiMPIR11VectorPatchiP9SimWindow+0xcc7)[0x475467] [lxbk0341:12471] [26] smilei(_ZN11VectorPatch11runAllDiagsER6ParamsP9SmileiMPIjR6TimersP9SimWindow+0x205)[0x4ff505] [lxbk0341:12471] [27] smilei(main+0x17bf)[0x434faf] [lxbk0341:12471] [28] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf5)[0x7ffb8079fb45] [lxbk0341:12471] [29] smilei(_ZN15DiagnosticTrack3runEP9SmileiMPIR11VectorPatchiP9SimWindow+0xcc7)[0x475467] [lxbk0341:12472] [26] smilei(_ZN11VectorPatch11runAllDiagsER6ParamsP9SmileiMPIjR6TimersP9SimWindow+0x205)[0x4ff505] [lxbk0341:12472] [27] smilei(main+0x17bf)[0x434faf] [lxbk0341:12472] [28] smilei[0x43592f] [lxbk0341:12471] End of error message

mccoys commented 6 years ago

I am a little surprised. I use successfully openmpi 3.0.0 and hdf5 1.10.1. The difference I see is gcc. I used sucessfully 4.8 and 5, but I never tried 6.3. Could you try another version of gcc?

We had other problems with HDF5 1.10.1 seemingly due to actual bugs in that version. We finally did not use the faulty HDF5 functions, and I have used the latest HDF5 fine in smilei.

I guess the problem could also come from Lustre, but then I am not sure how to approach this.

Solving the problem by changing H5FD_MPIO_COLLECTIVE to H5FD_MPIO_INDEPENDENT is not a good option. It may exert significant pressure on the file system for large simulations.

Instead, I suggest to try other versions of gcc or openmpi.

dsbertini commented 6 years ago

Hi, Smilei works with gcc 6.3 and older version of openMPI namely 1.10.x . The problem seems to be linked to new implementation of ROMIO as you can see from the coredump. I expect a lot of changes in IO collective buffering implementation between 1.10.x and 3.0.0. About Lustre , i noticed the same crash with 2 different Lustre version ( 2.6 and 2.10 ) I will recompile with older version of gcc and let you know

mccoys commented 6 years ago

@jderouillat, @beck-llr

Did you try running openmpi 3 and hdf5 1.10.1 on a Lustre filesystem before?

beck-llr commented 6 years ago

I didn't have the chance to. We did notice similar problems with openMPI 2.x but only on certain NFS file systems. It has a form of randomness in the sense that the same run can either work or fail and we have not been able to understand where this is coming from for the moment. HDF5 independent or collective flags does not impact this issue though so it is probably different. Thanks a bunch for reporting this to us ! As far as I am concerned, I stick to openMPI 1.6.5 at home and intelmpi on supercomputers.

mccoys commented 6 years ago

Another possibility worth checking: disable openMP (either at compilation time or by choosing OMP_NUM_THREADS=1). There might be some issues with threads while writing.

dsbertini commented 6 years ago

HI So i used now gcc 4.9 to compile all external and SMILEI code (gcc version 4.9.2 (Debian 4.9.2-10))

mccoys commented 6 years ago

It is unfortunate that you tested gcc 4.9. It turns out I tested 4.8 and 5.0!

Anyways, I don't think that should really matter. Why don't you stick with an older version of openmpi? Do you expect worse performance?

It is difficult, for us, to test smilei on a Lustre filesystem, as we don't have access to one. Ideally, we should make a minimum case that mimics the DiagTrackParticles scenario, and submit it to openmpi and/or hdf5 forum. However, that is a difficult piece of work.

dsbertini commented 6 years ago

I am not sure that the problem comes from MPI but rather HDF5. Let me first make some tests and then we could see clearer

dsbertini commented 6 years ago

As i was expecting, the problem is not link to a particular version of MPI but to the new versions of the HDF5 libraries 1.10.x . With the combination openMPI 3.0.0 together with HDF5 1.8.16 ( the one you recommended ) everything works properly. So for now i will stick to this combination of packages but still will investigate the particular problem with the new version. Here your proposal to make a simple example and send it to HDF5 group is of course relevant

beck-llr commented 6 years ago

This is a very precious feedback. Thanks for letting us know.

dsbertini commented 6 years ago

Hi, The problem is not directly linked to the function h5pset_dxpl_mpio() when using the H5FD_MPIO_COLLECTIVE flags since i was able to use it independently of SMILEI in an small IO test program. The problem occurred when a dataset is created and linked in the templated function DiagnosticTrak::write_scalar using the macro HD5create2(). Only the call of this function is responsible of the crash ... looks like a bug in HDF5 1.10.x

mccoys commented 6 years ago

Indeed looks like an HDF5 bug. Plus, as it works on my machine, it looks related to lustre. Did you manage to get your test program to witness the failure? This could be communicated to the hdf5 people.

dsbertini commented 6 years ago

Humm, difficult since the parallel HDF5 examples using the same features seems to works without any problem ... seems to be linked to the specific implementation used in SMILEI, can also be something side effect coming from openMP

mccoys commented 6 years ago

I am having a similar problem on my machine (not lustre). It occurs when some MPI ranks have no particles to write, thus make use of H5Sselect_none to tell HDF5 that these ranks should not write.

I will try to make a test case.

dsbertini commented 6 years ago

Hi So , fter detailed investigations i found the reason of the crash i experienced on Lustre. The problem is linked to the IO optimisation implemented in the class DiagnosticTrack.cpp at Line 237 where Data Chunking is used with a data chunk size being exactly the size of the Dataset. This should be of course allowed but mentionned in HDF5 documentation as not recommended for performance reason. Here the link: https://support.hdfgroup.org/HDF5/doc1.8/Advanced/Chunking/index.html I was able to make s very simple program which immediately crash when trying to collectively create the dataset. I attached the program "mytest.c" to this mail as text file. I also sent a post relative to this problem to the HDF-Forum with the test case program. Let see what they will say. In any case, using a data chunk size less than the datasets size ( lets say for example dividing by mpi_size ) solve the problem. mytest_C.txt

mccoys commented 6 years ago

Great find !!

I believe we could actually drop the chunk data layout when the datasets are not too big. Currently, the piece of code reads:

        if( nParticles_global>0 ){
            H5Pset_layout(plist, H5D_CHUNKED);

            // Set the chunk size
            unsigned int maximum_chunk_size = 100000000;
            unsigned int number_of_chunks = nParticles_global/maximum_chunk_size;
            if( nParticles_global%maximum_chunk_size != 0 ) number_of_chunks++;
            if( number_of_chunks==0 ) number_of_chunks = 1;
            unsigned int chunk_size = nParticles_global/number_of_chunks;
            if( nParticles_global%number_of_chunks != 0 ) chunk_size++;
            hsize_t chunk_dims = chunk_size;
            H5Pset_chunk(plist, 1, &chunk_dims);
        }

but we could change it to

        if( nParticles_global>0 ){
            // Set the chunk size
            unsigned int maximum_chunk_size = 100000000;
            unsigned int number_of_chunks = nParticles_global/maximum_chunk_size;
            if( nParticles_global%maximum_chunk_size != 0 ) number_of_chunks++;
            if( number_of_chunks==0 ) number_of_chunks = 1;
            unsigned int chunk_size = nParticles_global/number_of_chunks;
            if( nParticles_global%number_of_chunks != 0 ) chunk_size++;
            hsize_t chunk_dims = chunk_size;
            if( number_of_chunks > 1 ) {
                H5Pset_layout(plist, H5D_CHUNKED);
                H5Pset_chunk(plist, 1, &chunk_dims);
            }
        }

I have not checked yet, but will try soon. If you can also try on Lustre that would be great.

We still want to keep chunking available in case the number of particles is very large. We introduced it because some simulations required to write more than 4GB of particles in one dataset, which is too much for HDF5.

dsbertini commented 6 years ago

Well it works now with HDF5 1.10.x and Lustre 2.10!

mccoys commented 6 years ago

Did you apply exactly the changes I suggested above ?

And what version of openmpi ? 3.0.0 ?

dsbertini commented 6 years ago

I have just copy-pasted your changes and it works with the following configuration-packages:

I tried also with/without threading ( by changing the env. variable OMP_NUM_THREADS)

dsbertini commented 6 years ago

As far as i understood, you are using data chunking only in the case the data-file exceed the severe restriction of 4GB filesize-limit for HDF5? In your changes you just disable it if the file size do not reach this limit. But in principle one can use the data chunking to improve IO performance as well. Do you plan to use it in the future?

mccoys commented 6 years ago

Does chunking really help performances in this case ?

Every proc writes a slab of the array with a different size, so we cannot adapt the chunk size to each proc. Maybe I don't have a good impression on how chunking improves performance.

mccoys commented 6 years ago

I am closing this issue for now. Thank you very much for your investigation!

Concerning chunking, we can definitely consider implementing something. If some performance improvement can be proven, then it is not difficult to add a parameter.

dsbertini commented 6 years ago

For example the dimensions for each chunk can be chosen so that the subset of the dataset that each parallel process accesses maps exactly to one chunk in the file. Furthermore the HDF5 library allows the application to request alignment of all objects in a file over a particular size threshold, with theH5Pset_alignment API call. This allows aligning the chunks for chunked datasets to a favored block boundary for the file system.

mccoys commented 6 years ago

Does this mean that each chunk may have a distinct size from other chunks? If yes, I was not aware of this capability. Is it compatible with older versions of hdf5?

dsbertini commented 6 years ago

No chunked dataset’s elements are stored in equal- sized chunks within the file. But let suppose you have a matrix 8*8 as contiguous dataset if 4 processes were involved in performing I/O, the HDF5 dataset could be divided into 4 × 4 chunks of elements with each chunk aligned in the file. This would minimize lock contention in the parallel file system and maximize the file system throughput. I think this is one possible way improve IO. Just a guess so...

mccoys commented 6 years ago

Ok I understand. But the problem in this case is that each proc owns a different number of particles. This means we cannot choose a chunk size that isolates one proc from the other.