Open PhilMiller opened 11 years ago
Original date: 2013-08-29 19:22:07
When using IBM's XL Fortran as backend, it may also be useful to use "THREADLOCAL common
blocks":http://publib.boulder.ibm.com/infocenter/comphelp/v101v121/index.jsp?topic=/com.ibm.xlf121.aix.doc/proguide/thrd_com_blocks.html
Original date: 2013-08-30 03:08:07
One reason to lean toward ROSE is its support for Fortran, via OpenFortranParser. Clang as yet has no Fortran frontend.
Original date: 2013-09-01 18:18:47
In the OpenMP approach, the backend compiler will have to have its support for OpenMP enabled. Since we don't want existing OpenMP directives to lead to funny behavior, the tool should strip existing OpenMP directives before inserting its own. That may include OpenMP RTS function calls, like omp_num_threads()
, which would have to be replaced.
Original date: 2013-09-03 06:00:57
I've implemented a basic version of the ROSE/OpenMP variant of this tool in charmgit:users/phil/ampitls
with support for C and C++. It doesn't touch any header code, so global variables used across modules, or just declared in a header for whatever reason, won't be affected, for better or for worse.
I've tested it with examples/ampi/CJacobi3D
. The modified code and resulting object file seem to be as intended: global variables are marked OMP 'threadprivate', and are emitted as TLS when the backend g++ compiler is passed -fopenmp
.
I think the next logical step is to make it support Fortran as well, given how many target codes, especially ones we care about, are written in Fortran.
Original date: 2013-09-03 07:54:07
I just tested with a class-scoped static variable in C++, and found that ROSE is apparently misbehaving. I sent a message to their mailing list to try to get that cleared up.
Original date: 2013-09-06 05:16:25
Fortran module-level variables are now handled. Next up, save variables.
Original date: 2013-09-06 05:43:55
Variables declared explicitly with the Fortran save
attribute are now handled. Variables declared after a standalone save
directive aren't, and ISAM actually uses such in vegnstore_module.F90
. I suspect there are variables that are implicitly save
by virtue of being initialized at the point of declaration, too, and I need to make sure those work as well.
Original date: 2013-09-06 06:22:37
Here's the feature compatibility matrix that drove the choice to do a ROSE/OpenMP implementation first:
|C|R|C| |
|l|O|/| |
|a|S|C|F|
|n|E|+|T|
|g| |+|N|
--------|
__thread |y|y|y|n
OpenMP |y|y|y|y
C/C++ |y|y
Fortran |n|y
Basically, wanting to broadly support Fortran rules out Clang and __thread
Original date: 2013-09-06 21:59:08
Fortran's block-level SAVE
statement (save all local variables declared in that subroutine) and implicit save (variables initialized at their point of declaration) are now handled. There are no COMMON
blocks in ISAM, so this should suffice to AMPI-ize it entirely.
Original date: 2013-09-09 23:01:03
Currently trying to test with ISAM on Hopper. Challenging due to the dependence on Intel's Fortran and ROSE's baroque build process.
Original date: 2013-09-09 23:51:12
Rose's Fortran parsing support is apparently not as accepting of various things that Intel's Fortran compiler is. There's also an issue with Fortran module files from external libraries being incompatible:
/omp-rose -rose:relax_syntax_check isam_ncdio_module.f90 -o isam_ncdio_module.o -c -I/global/u2/p/pmiller/ISAM/build/src/assert -I/global/u2/p/pmiller/ISAM/build/src/isam_offline -I. -I/opt/cray/netcdf/4.2.0/intel/120/include -I/opt/cray/parallel-netcdf/1.3.1/intel/120/include
WARNING: the input directory does not exist : /global/u2/p/pmiller/ISAM/build/src/assert
WARNING: the input directory does not exist : /global/u2/p/pmiller/ISAM/build/src/isam_offline
/global/u2/p/pmiller/ISAM/build/isam_ncdio_module.f90:3.12:
use netcdf
1
Fatal Error: File 'netcdf.mod' opened at (1) is not a GFORTRAN module file
Syntax errors detected in input fortran program ...
make[1]: *** [isam_ncdio_module.o] Error 1
Now that the tool is 'built', it may be a long slog to actually get it to work usefully. The work by Atul's group to make the code more standards-compliant will certainly help here.
Original date: 2013-09-10 17:21:53
The issue with C++ class-scoped static variables was not so much a bug in ROSE as a quirk of how it represents various sorts of declarations. That limitation of the tool is now fixed.
That implies that the only remaining language feature of potential concern are Fortran COMMON
blocks.
Original date: 2013-09-10 20:30:53
OpenMP removal is now fully implemented. Tested primarily by round-tripping the tool's output back into it, and seeing that it's identical.
Original date: 2013-09-12 18:22:34
A better approach to existing OpenMP code might be to produce a modified version of the existing 'OpenMP lowering' transformation, to initially treat all such entities as effective no-ops - discard parallel regions, emit no-ops for locks and settings functions, replace omp_get_num_threads()
with a constant 1, etc.
Original date: 2013-09-12 20:03:58
Integrated support for the tool directly in charmc. Users need to build the tool and set an environment variable $CMK_ROSE_OMP_TOOL
to its path, then pass the command line argument -roseomptlsglobals
to charmc.
Documentation will be added after the various bits have been tested to work on some more application code than just Cjacobi3D.
Original date: 2013-09-12 22:45:30
Have now successfully processed NPB's 'IS' C-language benchmark with just the setting MPICC = $(CHARMDIR)/ampicc -roseomptlsglobals
in the Makefile
, run it with virtualization, and obtained successfully verified results.
Had to fix attempts to privatize function parameters along the way.
Original date: 2013-09-12 23:20:59
Tried to work with the Phloem MPI benchmark suite (found via the ASC Sequoia acceptance tests page). Ran into the problem of handling variables declared in headers. Will actually have to address that now.
Original date: 2013-09-13 03:54:27
Really nasty hacks to unparse the included headers, and then exclude various junk that comes with them that makes the backend compiler unhappy. Now left with a few things that get privatized in one compilation unit but not the other.
Original date: 2013-09-17 00:09:56
I can now compile the entire 'phloem' suite of C language MPI benchmarks from the Sequoia acceptance tests, and run the 'presta' benchmark contained therein.
As described on the mailing list, I had to use libthread-pthreads.o
instead of one of the libthreads-*-tls.o
variants to avoid crashes in C library routines.
Original date: 2013-09-17 20:21:23
Will not be production-ready within the target timeframe.
Original date: 2015-09-14 01:36:27
What is the current status of this tool?
Original date: 2017-12-29 17:51:53
Mass re-assign AMPI-related issues on my plate to Sam, for subsequent redistribution.
Original date: 2017-12-29 17:52:14
Mass re-assign AMPI-related issues on my plate to Sam, for subsequent redistribution.
For real this time.
Original issue: https://charm.cs.illinois.edu/redmine/issues/283
We have constant-time TLS variable swapping support for our user-space threads that we run as AMPI ranks. A simple tool can make usage of that technique much more broadly applicable.
When compiling a source file with
ampicc/ampicxx
with an appropriate option, it should invoke a tool that does the followingThis could be done with either Clang or ROSE. My guess is that the difficulty should be about equal, and so the decision should be made based on availability of the necessary knowledge/skills and the libraries for each on target systems. It may even be worthwhile to simply build both
The thread-private declarations could be done using either the
__thread
attribute or an OpenMP#pragma openmp threadprivate(var)
. Which is preferable may depend on the back-end compiler, what it supports, what options it takes, etc.1 This constraint appears since the compiled library would fail to link with the differently-referenced variables. Where the library's members actually do need to be private to each rank, we may have to recompile those libraries and include/link our own version as well.