ComputationalRadiationPhysics / picongpu

Performance-Portable Particle-in-Cell Simulations for the Exascale Era :sparkles:
https://picongpu.readthedocs.io
Other
708 stars 218 forks source link

Corrupted .h5 files w/ particles #5222

Open weqoll opened 4 days ago

weqoll commented 4 days ago

Hello everyone!

I'm trying to repeat setup with preionized foil (tilted at 45 degrees) and femtosecond pulse. Fs-pulse goes along x-axis and reflects from foil, causing some extra light radiation to generate via redistribution of electrons. Everything goes OK at the calculations phase, however there are issues with openPMD output and its formation into .h5-files.

All works fine with fields, but in the case of particles everything goes weird; Find this issue with almost standard Probe Particles setup: everything repeats your workflow documentation except for its distribution. I formed an x-line with thickness about 1-2 x-steps. So it all goes like this: image

MacroParticlesCounter reports that everything is OK with probes initialization, their number is about the size of simulation area linear size. However, when I try to read probes dataset from this file with matlab function h5read, there goes an error:

Error using h5readc
The HDF5 library encountered an error and produced the following stack trace information:

    H5HL__hdr_deserialize    bad local heap signature

Error in h5read (line 93)
[data,var_class] = h5readc(Filename,Dataset,start,count,stride);

Function h5disp() gives me such warning:

>> h5disp("./picongpu_files/run-16nov24-try.new.1/simOutput/openPMD/simData_000005.h5","/data")
Warning: Unable to read '/e' from the file. A portion of the file may be corrupt. 
> In h5info (line 125)
In matlab.io.internal.imagesci.HDF5DisplayUtils.displayHDF5 (line 34)
In h5disp (line 123) 
HDF5 simData_000005.h5 
Group '/data' 

That's kinda weird, because when i try to write .h5 files earlier everything was ok and electrons+ions particle distributions was taken without any errors.

Could you help me with this issue? Or at least could you give me some ways to debug such issue? I've recompiled a lot of times this probe setup even with standard EveryNthCell density profile, but this issue's recreating over and over. Thank you!

I've installed the actual PIConGPU version for November 15th at dev-branch, my OS version is:

Operating System: Debian GNU/Linux 12 (bookworm)
          Kernel: Linux 6.1.0-27-amd64
    Architecture: x86-64

density.param snippet:

        struct ProbeXLineParam {
            HDINLINE float_64 operator()(const floatD_64& position_SI, const float3_64& cellSize_SI) {
                const float_64 x(position_SI.x() * 1e6);
                const float_64 y(position_SI.y() * 1e6);

                constexpr float_64 wlen(3.9);
                constexpr float_64 y0(0.5*wlen);
                constexpr float_64 lthk(0.003907*wlen);
                constexpr float_64 yb(y0-lthk/2);
                constexpr float_64 ye(y0+lthk/2);
                float_64 s(0.);

                if(x > 0.125*wlen && x < 6.875*wlen){
                    if(y > yb && y < ye) {
                        s = 1;
                    }
                }
                s *= float_X(s >= 0.0);
                return s;
            }
        };

        using ProbeXLine = FreeFormulaImpl<ProbeXLineParam>;

particle.param snippet:

            /** Configuration of initial in-cell particle position
             *
             * Here, macro-particles sit directly in lower corner of the cell.
             */
            struct OnePositionParameter
            {
                /** Maximum number of macro-particles per cell during density profile evaluation.
                 *
                 * Determines the weighting of a macro particle as well as the number of
                 * macro-particles which sample the evolution of the particle distribution
                 * function in phase space.
                 *
                 * unit: none
                 */
                static constexpr uint32_t numParticlesPerCell = 1u;

                /** each x, y, z in-cell position component in range [0.0, 1.0)
                 *
                 * @details in 2D the last component is ignored
                 */
                static constexpr auto inCellOffset = float3_X(0.5, 0.5, 0.);
            };
            /** Definition of OnePosition start position functor that
             * places macro-particles at the initial in-cell position defined above.
             */
            using OnePosition = OnePositionImpl<OnePositionParameter>;
        } 

speciesDefinition.param snippet:

    /*---------------------------- probes -----------------------------------------------*/

    using ParticleFlagsProbes = MakeSeq_t<
        particlePusher< particles::pusher::Probe >,
        shape< UsedParticleShape >,
        interpolation< UsedField2Particle >
    >;

    using ProbeX = Particles<
        PMACC_CSTRING( "probe" ),
        ParticleFlagsProbes,
        MakeSeq_t<
            position< position_pic >,
            probeB,
            probeE
        >
    >;

speciesInitialization.param snippet:

       using InitPipeline = pmacc::mp_list<
            CreateDensity<densityProfiles::Tilt45FoilWithARamp, startPosition::Random, PIC_Ions>,
            Manipulate<manipulators::SetOnceIonized, PIC_Ions>,
            Derive<PIC_Ions, PIC_Electrons>,
            CreateDensity<densityProfiles::ProbeXLine, startPosition::OnePosition, ProbeX>
        >;

fileOutput.param snippet:

    using FileOutputParticles = MakeSeq_t<ProbeX,PIC_Electrons,PIC_Ions>;
psychocoderHPC commented 3 days ago

Could you please check with h5dump --contents=1 FILENAME if the hdf5 folder structure is readable and the species named as expected. With h5dump you can try to read the file on the terminal to check if the file is corrupted. In many cases this happens if the file was not correctly closed during writing.

weqoll commented 3 days ago

With h5dump --contents ./openPMD/simData_000005.h5 I get such error:

h5dump error: internal error (file ../../../../../tools/src/h5dump/h5dump.c:line 1430)

With h5dump --enable-error-stack ./openPMD/simData_000005.h5 I get this stack:


HDF5-DIAG: Error detected in HDF5 (1.10.8) thread 1:
  #000: ../../../src/H5O.c line 510 in H5Oget_info_by_name2(): can't get info for object: 'data/5/fields'
    major: Object header
    minor: Can't get value
  #001: ../../../src/H5Gloc.c line 702 in H5G_loc_info(): can't find object
    major: Symbol table
    minor: Object not found
  #002: ../../../src/H5Gtraverse.c line 832 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #003: ../../../src/H5Gtraverse.c line 608 in H5G__traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #004: ../../../src/H5Gloc.c line 660 in H5G__loc_info_cb(): can't get object info
    major: Symbol table
    minor: Can't get value
  #005: ../../../src/H5Oint.c line 2184 in H5O_get_info(): unable to determine object class
    major: Object header
    minor: Can't get value
  #006: ../../../src/H5Oint.c line 1765 in H5O__obj_class_real(): unable to determine object type
    major: Object header
    minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.10.8) thread 1:
  #000: ../../../src/H5L.c line 1292 in H5Lvisit_by_name(): link visitation failed
    major: Links
    minor: Iteration failed
  #001: ../../../src/H5Gint.c line 1150 in H5G_visit(): can't visit links
    major: Symbol table
    minor: Iteration failed
  #002: ../../../src/H5Gobj.c line 674 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #003: ../../../src/H5Gstab.c line 537 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #004: ../../../src/H5B.c line 1195 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #005: ../../../src/H5B.c line 1154 in H5B__iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #006: ../../../src/H5Gnode.c line 977 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #007: ../../../src/H5Gobj.c line 674 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #008: ../../../src/H5Gstab.c line 537 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #009: ../../../src/H5B.c line 1195 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #010: ../../../src/H5B.c line 1154 in H5B__iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #011: ../../../src/H5Gnode.c line 977 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #012: ../../../src/H5Gobj.c line 674 in H5G__obj_iterate(): can't iterate over symbol table
    major: Symbol table
    minor: Iteration failed
  #013: ../../../src/H5Gstab.c line 537 in H5G__stab_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #014: ../../../src/H5B.c line 1195 in H5B_iterate(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #015: ../../../src/H5B.c line 1154 in H5B__iterate_helper(): B-tree iteration failed
    major: B-Tree node
    minor: Iteration failed
  #016: ../../../src/H5Gnode.c line 977 in H5G__node_iterate(): iteration operator failed
    major: Symbol table
    minor: Can't move to next iterator location
  #017: ../../../src/H5O.c line 510 in H5Oget_info_by_name2(): can't get info for object: 'data/5/fields'
    major: Object header
    minor: Can't get value
  #018: ../../../src/H5Gloc.c line 702 in H5G_loc_info(): can't find object
    major: Symbol table
    minor: Object not found
  #019: ../../../src/H5Gtraverse.c line 832 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #020: ../../../src/H5Gtraverse.c line 608 in H5G__traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #021: ../../../src/H5Gloc.c line 660 in H5G__loc_info_cb(): can't get object info
    major: Symbol table
    minor: Can't get value
  #022: ../../../src/H5Oint.c line 2184 in H5O_get_info(): unable to determine object class
    major: Object header
    minor: Can't get value
  #023: ../../../src/H5Oint.c line 1765 in H5O__obj_class_real(): unable to determine object type
    major: Object header
    minor: Unable to initialize object
h5dump error: internal error (file ../../../../../tools/src/h5dump/h5dump.c:line 1430)
H5tools-DIAG: Error detected in HDF5:tools (1.10.8) thread 1:
  #000: ../../../../tools/lib/h5tools_utils.c line 618 in init_objs(): finding shared objects failed
    major: Failure in tools library
    minor: error in function
  #001: ../../../../tools/lib/h5trav.c line 1040 in h5trav_visit(): traverse failed
    major: Failure in tools library
    minor: error in function
  #002: ../../../../tools/lib/h5trav.c line 286 in traverse(): H5Lvisit_by_name failed
    major: Failure in tools library
    minor: error in function
weqoll commented 3 days ago

here is output for 5 steps of simulation with OPENPMD_VERBOSE=1: simulationOutput.txt also, here is snippet from .cfg file:

TBG_openPMD_sp="--openPMD.period 0:60000:5 --openPMD.file simData --openPMD.ext h5 --openPMD.source species_all --openPMD.json='{\"hdf5\": {\"dataset\": {\"chunks\":\"auto\"}}}'"

Without chuncking I can't write my openPMD output with species at all