ComputationalRadiationPhysics / picongpu

Performance-Portable Particle-in-Cell Simulations for the Exascale Era :sparkles:
https://picongpu.readthedocs.io
Other
695 stars 218 forks source link

cuSTL Error on -G #1951

Open ax3l opened 7 years ago

ax3l commented 7 years ago

Compiling with -DCUDA_NVCC_FLAGS_DEBUG="-g;-G" unravels the following compile issue:

ptxas error   : Entry function
  '_ZN5PMacc6nvidia16gpuEntryFunctionINS_9algorithm6kernel6detail18KernelForeachBlockEJNS4_13SphericMapperILi3ENS_4math2CT6VectorIN4mpl_10integral_cIiLi8EEESC_NSB_IiLi4EEEEENSA_5void_EEENS_6cursor6CursorINSH_14MarkerAccessorINS7_3IntILi3EEEEENSH_19MultiIndexNavigatorILi3EEESL_EEN8picongpu12FunctorBlockINSQ_9ParticlesIN5boost3mpl6stringILi101ELi0ELi0ELi0ELi0ELi0ELi0ELi0EEENSU_6vectorINSQ_24placeholder_definition2814particlePusherINSQ_9particles6pusher5BorisENS_24placeholder_definition1513pmacc_isAliasEEENSQ_24placeholder_definition275shapeINS10_6shapes3TSCES14_EENSQ_24placeholder_definition3413interpolationINSQ_28FieldToParticleInterpolationIS19_NSQ_30AssignedTrilinearInterpolationEEES14_EENSQ_24placeholder_definition357currentINSQ_13currentSolver9EsirkepovIS19_Lj3EEES14_EENSQ_24placeholder_definition389massRatioINSQ_25placeholder_definition11918MassRatioElectronsES14_EENSQ_24placeholder_definition3911chargeRatioINSQ_25placeholder_definition12020ChargeRatioElectronsES14_EENSA_2naES1X_S1X_S1X_S1X_S1X_S1X_S1X_S1X_S1X_S1X_S1X_S1X_S1X_EENSU_6v_itemINSQ_24placeholder_definition239weightingENS1Z_INSQ_24placeholder_definition218momentumENS1Z_INSQ_24placeholder_definition188positionINSQ_24placeholder_definition2012position_picES14_EENSU_7vector0IS1X_EELi0EEELi0EEELi0EEEEESE_fLj1024ELj2EEEEEEvT_DpT0_'
  uses too much shared data (0xcf10 bytes, 0xc000 max)

ptxas info    : 92194 bytes gmem, 1312 bytes cmem[2], 1360 bytes cmem[14]

The entry function in the default LWFA example is:

void PMacc::nvidia::gpuEntryFunction<
  PMacc::algorithm::kernel::detail::KernelForeachBlock,
  PMacc::algorithm::kernel::detail::SphericMapper<
    3,
    PMacc::math::CT::Vector<
      mpl_::integral_c<int, 8>,
      mpl_::integral_c<int, 8>,
      mpl_::integral_c<int, 4>
    >,
    mpl_::void_
  >,
  PMacc::cursor::Cursor<
    PMacc::cursor::MarkerAccessor<PMacc::math::Int<3> >,
    PMacc::cursor::MultiIndexNavigator<3>,
    PMacc::math::Int<3>
  >,
  picongpu::FunctorBlock<
    picongpu::Particles<
      boost::mpl::string<101, 0, 0, 0, 0, 0, 0, 0>,
      boost::mpl::vector<
        picongpu::placeholder_definition28::particlePusher<picongpu::particles::pusher::Boris, PMacc::placeholder_definition15::pmacc_isAlias>,
        picongpu::placeholder_definition27::shape<picongpu::particles::shapes::TSC, PMacc::placeholder_definition15::pmacc_isAlias>,
        picongpu::placeholder_definition34::interpolation<picongpu::FieldToParticleInterpolation<picongpu::particles::shapes::TSC, picongpu::AssignedTrilinearInterpolation>,
        PMacc::placeholder_definition15::pmacc_isAlias>,
        picongpu::placeholder_definition35::current<picongpu::currentSolver::Esirkepov<picongpu::particles::shapes::TSC, 3u>, PMacc::placeholder_definition15::pmacc_isAlias>,
        picongpu::placeholder_definition38::massRatio<picongpu::placeholder_definition119::MassRatioElectrons, PMacc::placeholder_definition15::pmacc_isAlias>,
        picongpu::placeholder_definition39::chargeRatio<picongpu::placeholder_definition120::ChargeRatioElectrons, PMacc::placeholder_definition15::pmacc_isAlias>,
        mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
      boost::mpl::v_item<picongpu::placeholder_definition23::weighting,
      boost::mpl::v_item<picongpu::placeholder_definition21::momentum,
      boost::mpl::v_item<picongpu::placeholder_definition18::position<picongpu::placeholder_definition20::position_pic, PMacc::placeholder_definition15::pmacc_isAlias>,
      boost::mpl::vector0<mpl_::na>, 0>, 0>, 0> >,
    PMacc::math::CT::Vector<
      mpl_::integral_c<int, 8>,
      mpl_::integral_c<int, 8>,
      mpl_::integral_c<int, 4>
    >,
    float,
    1024u,
    2u
  >
>(
  PMacc::algorithm::kernel::detail::KernelForeachBlock,
  PMacc::algorithm::kernel::detail::SphericMapper<3, PMacc::math::CT::Vector<mpl_::integral_c<int, 8>, mpl_::integral_c<int, 8>, mpl_::integral_c<int, 4> >, mpl_::void_>,
  PMacc::cursor::Cursor<PMacc::cursor::MarkerAccessor<PMacc::math::Int<3> >, PMacc::cursor::MultiIndexNavigator<3>, PMacc::math::Int<3> >, picongpu::FunctorBlock<picongpu::Particles<boost::mpl::string<101, 0, 0, 0, 0, 0, 0, 0>, boost::mpl::vector<picongpu::placeholder_definition28::particlePusher<picongpu::particles::pusher::Boris, PMacc::placeholder_definition15::pmacc_isAlias>, picongpu::placeholder_definition27::shape<picongpu::particles::shapes::TSC, PMacc::placeholder_definition15::pmacc_isAlias>, picongpu::placeholder_definition34::interpolation<picongpu::FieldToParticleInterpolation<picongpu::particles::shapes::TSC, picongpu::AssignedTrilinearInterpolation>, PMacc::placeholder_definition15::pmacc_isAlias>, picongpu::placeholder_definition35::current<picongpu::currentSolver::Esirkepov<picongpu::particles::shapes::TSC, 3u>, PMacc::placeholder_definition15::pmacc_isAlias>, picongpu::placeholder_definition38::massRatio<picongpu::placeholder_definition119::MassRatioElectrons, PMacc::placeholder_definition15::pmacc_isAlias>, picongpu::placeholder_definition39::chargeRatio<picongpu::placeholder_definition120::ChargeRatioElectrons, PMacc::placeholder_definition15::pmacc_isAlias>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, boost::mpl::v_item<picongpu::placeholder_definition23::weighting, boost::mpl::v_item<picongpu::placeholder_definition21::momentum, boost::mpl::v_item<picongpu::placeholder_definition18::position<picongpu::placeholder_definition20::position_pic, PMacc::placeholder_definition15::pmacc_isAlias>, boost::mpl::vector0<mpl_::na>, 0>, 0>, 0> >, PMacc::math::CT::Vector<mpl_::integral_c<int, 8>, mpl_::integral_c<int, 8>, mpl_::integral_c<int, 4> >, float, 1024u, 2u>
)
ax3l commented 7 years ago

Using CUDA_KEEP_FILES=ON and CUDA_SHOW_CODELINES=ON reveals the error is triggered somewhere around src/picongpu/include/plugins/PhaseSpace/PhaseSpaceFunctors.hpp:131 struct FunctorBlock (careful, maybe it's also the ptx lines below that: see snippet.txt)

grep -B 200 -A 500 "_ZN5PMacc6nvidia16gpuEntryFunctionINS_[... see above ...]_" build_picongpu/nvcc_tmp/main.ptx > cut.txt: snippet.txt

ax3l commented 7 years ago

quick-hack: reduce shared memory of phase space functor to 16KB via maxShared = 16*1024 (this should reduce the p resolution in terms of bins for the user-selected range by a factor 2 from 1024 to 512 bins)

ax3l commented 7 years ago

@PrometheusPi reports that compiling the Bremsstrahlung example with -G even works without the above hack.

ax3l commented 7 years ago

We have two ways to mitigate this issue:

a) find the underlying issue which part of cuSTL can not be optimized anymore with -G b) reduce the phase space size in momentum in -G device-side debug mode (somewhat related to #469)