charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
202 stars 49 forks source link

TLS-based AMPI variable privatization source-to-source translation tool #283

Open PhilMiller opened 11 years ago

PhilMiller commented 11 years ago

Original issue: https://charm.cs.illinois.edu/redmine/issues/283


We have constant-time TLS variable swapping support for our user-space threads that we run as AMPI ranks. A simple tool can make usage of that technique much more broadly applicable.

When compiling a source file with ampicc/ampicxx with an appropriate option, it should invoke a tool that does the following

This could be done with either Clang or ROSE. My guess is that the difficulty should be about equal, and so the decision should be made based on availability of the necessary knowledge/skills and the libraries for each on target systems. It may even be worthwhile to simply build both

The thread-private declarations could be done using either the __thread attribute or an OpenMP #pragma openmp threadprivate(var). Which is preferable may depend on the back-end compiler, what it supports, what options it takes, etc.

1 This constraint appears since the compiled library would fail to link with the differently-referenced variables. Where the library's members actually do need to be private to each rank, we may have to recompile those libraries and include/link our own version as well.

PhilMiller commented 5 years ago

Original date: 2013-08-29 19:22:07


When using IBM's XL Fortran as backend, it may also be useful to use "THREADLOCAL common blocks":http://publib.boulder.ibm.com/infocenter/comphelp/v101v121/index.jsp?topic=/com.ibm.xlf121.aix.doc/proguide/thrd_com_blocks.html

PhilMiller commented 5 years ago

Original date: 2013-08-30 03:08:07


One reason to lean toward ROSE is its support for Fortran, via OpenFortranParser. Clang as yet has no Fortran frontend.

PhilMiller commented 5 years ago

Original date: 2013-09-01 18:18:47


In the OpenMP approach, the backend compiler will have to have its support for OpenMP enabled. Since we don't want existing OpenMP directives to lead to funny behavior, the tool should strip existing OpenMP directives before inserting its own. That may include OpenMP RTS function calls, like omp_num_threads(), which would have to be replaced.

PhilMiller commented 5 years ago

Original date: 2013-09-03 06:00:57


I've implemented a basic version of the ROSE/OpenMP variant of this tool in charmgit:users/phil/ampitls with support for C and C++. It doesn't touch any header code, so global variables used across modules, or just declared in a header for whatever reason, won't be affected, for better or for worse.

I've tested it with examples/ampi/CJacobi3D. The modified code and resulting object file seem to be as intended: global variables are marked OMP 'threadprivate', and are emitted as TLS when the backend g++ compiler is passed -fopenmp.

I think the next logical step is to make it support Fortran as well, given how many target codes, especially ones we care about, are written in Fortran.

PhilMiller commented 5 years ago

Original date: 2013-09-03 07:54:07


I just tested with a class-scoped static variable in C++, and found that ROSE is apparently misbehaving. I sent a message to their mailing list to try to get that cleared up.

PhilMiller commented 5 years ago

Original date: 2013-09-06 05:16:25


Fortran module-level variables are now handled. Next up, save variables.

PhilMiller commented 5 years ago

Original date: 2013-09-06 05:43:55


Variables declared explicitly with the Fortran save attribute are now handled. Variables declared after a standalone save directive aren't, and ISAM actually uses such in vegnstore_module.F90. I suspect there are variables that are implicitly save by virtue of being initialized at the point of declaration, too, and I need to make sure those work as well.

PhilMiller commented 5 years ago

Original date: 2013-09-06 06:22:37


Here's the feature compatibility matrix that drove the choice to do a ROSE/OpenMP implementation first:

         |C|R|C| |
         |l|O|/| |
         |a|S|C|F|
         |n|E|+|T|
         |g| |+|N|
         --------|
__thread |y|y|y|n
OpenMP   |y|y|y|y
C/C++    |y|y
Fortran  |n|y 

Basically, wanting to broadly support Fortran rules out Clang and __thread

PhilMiller commented 5 years ago

Original date: 2013-09-06 21:59:08


Fortran's block-level SAVE statement (save all local variables declared in that subroutine) and implicit save (variables initialized at their point of declaration) are now handled. There are no COMMON blocks in ISAM, so this should suffice to AMPI-ize it entirely.

PhilMiller commented 5 years ago

Original date: 2013-09-09 23:01:03


Currently trying to test with ISAM on Hopper. Challenging due to the dependence on Intel's Fortran and ROSE's baroque build process.

PhilMiller commented 5 years ago

Original date: 2013-09-09 23:51:12


Rose's Fortran parsing support is apparently not as accepting of various things that Intel's Fortran compiler is. There's also an issue with Fortran module files from external libraries being incompatible:

/omp-rose -rose:relax_syntax_check isam_ncdio_module.f90 -o isam_ncdio_module.o -c   -I/global/u2/p/pmiller/ISAM/build/src/assert -I/global/u2/p/pmiller/ISAM/build/src/isam_offline  -I. -I/opt/cray/netcdf/4.2.0/intel/120/include -I/opt/cray/parallel-netcdf/1.3.1/intel/120/include
WARNING: the input directory does not exist : /global/u2/p/pmiller/ISAM/build/src/assert
WARNING: the input directory does not exist : /global/u2/p/pmiller/ISAM/build/src/isam_offline
/global/u2/p/pmiller/ISAM/build/isam_ncdio_module.f90:3.12:

  use netcdf
            1
Fatal Error: File 'netcdf.mod' opened at (1) is not a GFORTRAN module file
Syntax errors detected in input fortran program ... 
make[1]: *** [isam_ncdio_module.o] Error 1

Now that the tool is 'built', it may be a long slog to actually get it to work usefully. The work by Atul's group to make the code more standards-compliant will certainly help here.

PhilMiller commented 5 years ago

Original date: 2013-09-10 17:21:53


The issue with C++ class-scoped static variables was not so much a bug in ROSE as a quirk of how it represents various sorts of declarations. That limitation of the tool is now fixed.

That implies that the only remaining language feature of potential concern are Fortran COMMON blocks.

PhilMiller commented 5 years ago

Original date: 2013-09-10 20:30:53


OpenMP removal is now fully implemented. Tested primarily by round-tripping the tool's output back into it, and seeing that it's identical.

PhilMiller commented 5 years ago

Original date: 2013-09-12 18:22:34


A better approach to existing OpenMP code might be to produce a modified version of the existing 'OpenMP lowering' transformation, to initially treat all such entities as effective no-ops - discard parallel regions, emit no-ops for locks and settings functions, replace omp_get_num_threads() with a constant 1, etc.

PhilMiller commented 5 years ago

Original date: 2013-09-12 20:03:58


Integrated support for the tool directly in charmc. Users need to build the tool and set an environment variable $CMK_ROSE_OMP_TOOL to its path, then pass the command line argument -roseomptlsglobals to charmc.

Documentation will be added after the various bits have been tested to work on some more application code than just Cjacobi3D.

PhilMiller commented 5 years ago

Original date: 2013-09-12 22:45:30


Have now successfully processed NPB's 'IS' C-language benchmark with just the setting MPICC = $(CHARMDIR)/ampicc -roseomptlsglobals in the Makefile, run it with virtualization, and obtained successfully verified results.

Had to fix attempts to privatize function parameters along the way.

PhilMiller commented 5 years ago

Original date: 2013-09-12 23:20:59


Tried to work with the Phloem MPI benchmark suite (found via the ASC Sequoia acceptance tests page). Ran into the problem of handling variables declared in headers. Will actually have to address that now.

PhilMiller commented 5 years ago

Original date: 2013-09-13 03:54:27


Really nasty hacks to unparse the included headers, and then exclude various junk that comes with them that makes the backend compiler unhappy. Now left with a few things that get privatized in one compilation unit but not the other.

PhilMiller commented 5 years ago

Original date: 2013-09-17 00:09:56


I can now compile the entire 'phloem' suite of C language MPI benchmarks from the Sequoia acceptance tests, and run the 'presta' benchmark contained therein.

As described on the mailing list, I had to use libthread-pthreads.o instead of one of the libthreads-*-tls.o variants to avoid crashes in C library routines.

PhilMiller commented 5 years ago

Original date: 2013-09-17 20:21:23


Will not be production-ready within the target timeframe.

nikhil-jain commented 5 years ago

Original date: 2015-09-14 01:36:27


What is the current status of this tool?

PhilMiller commented 5 years ago

Original date: 2017-12-29 17:51:53


Mass re-assign AMPI-related issues on my plate to Sam, for subsequent redistribution.

PhilMiller commented 5 years ago

Original date: 2017-12-29 17:52:14


Mass re-assign AMPI-related issues on my plate to Sam, for subsequent redistribution.

For real this time.