arporter / psycloned_nemo

PSyclone-processed NEMO source files for collaboration
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Version with profiling #1

Open arporter opened 5 years ago

arporter commented 5 years ago

I've created the profiling_new_api branch which has profiling regions inserted. To compile this you'll need to build the PSyclone profiling wrapper library. This can be found in PSyclone/lib/profiling/nvidia/. Once you've built it, you need to tweak the NEMO build system to tell it to ignore the associated module use statements and to link with the wrapper library and the nvtx library itself. I've added the following to my arch file (edit as appropriate for your system):

%PROFILE_HOME        ${HOME}/PSyclone/lib/profiling/nvidia
%PROFILE_LIB         -L%PROFILE_HOME -lnvtx_prof -L${CUDA_DIR}/lib64 -lnvToolsExt
%PROFILE_INC         -I%PROFILE_HOME

and then extended the list of include and link flags:

%USER_INC            %PROFILE_INC %XIOS_INC %OASIS_INC %NCDF_INC
%USER_LIB            %PROFILE_LIB %XIOS_LIB %OASIS_LIB %NCDF_LIB

I also edited dev_r10037_GPU/mk/bldxag.cfg and added the following line:

bld::excl_dep        use::profile_psy_data_mod

(this tells FCM it doesn't need to try and build the profile_psy_data_mod module.)

arporter commented 5 years ago

I've altered my script so that it permits single-line IF statements inside KERNELS regions. I've also done some experimenting and identified some files that I can now process that I couldn't previously. This makes the profile more informative if nothing else:

nemo_icestp_profiled

Most of the remaining white space is due to either global sums (especially in stp_ctl in stpctl.f90) or the packed halo exchanges (lbc_lnk_ptr in lbclnk.f90).

arporter commented 5 years ago

An update: the support for the NVTX profiling API is now on master in PSyclone.

arporter commented 4 years ago

nemo_prof_kernels_inside_tracers_tranxtvvl I've (manually) tweaked traldf_iso and tra_nxt_vvl to put KERNELS in more sensible/performant locations. They've now disappeared from the profile :-)

arporter commented 4 years ago

nemo_prof_gsum I've (manually) optimised the global sums in stp_ctl - the source of the white-space on the RHS of the profiles before this one. I've also introduced a heuristic that puts KERNELS inside loops over levels when they contain 2 or more loops. The latter is essential in a couple of the big kernels but the overall performance benefit is questionable. Still, it's only a small change to the script :-)

arporter commented 4 years ago

Realised I had a bug in the script that meant that KERNELS were not being put in lower branches of CASE statements. Also realised that PSyclone can now process icetab.f90, however resulting code is slower...

arporter commented 4 years ago
nemo_prof_icetab
arporter commented 4 years ago

Have got NEMO compiling with the latest version of PSyclone and PGI 19.10. Since the profiling API has changed I've created a new branch (profiling_new_api) in this repo. You will need to build the latest version of the nvidia wrapper library distributed with PSyclone. (See description of this Issue.) Resulting code is fastest yet:

nemo_prof_220520