cyclops-community / ctf

Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays
Other
199 stars 53 forks source link

Removed barriers that can confuse critical path profilers #97

Closed huttered40 closed 4 years ago

huttered40 commented 4 years ago

I actually do not think these were confusing critter, because critter uses the -DCRITTER flag instead of the -DPROFILE flag. All of the barriers removed in this commit existed only if compiled with -DPROFILE.

Either way, since the barriers are used only when profiling and not for production, I don't see a negative to removing them (even if their presence somehow reports better performance for these per-process measurement profiling tests).

huttered40 commented 4 years ago

Again, not sure why Github is reporting 5 commits instead of 3. 2 were already merged, so I don't think they would create multiple commit copies upon merge. Not sure how else to "clean" a fork to prevent this besides just deleting my fork and re-forking.

solomonik commented 4 years ago

Thanks for the adjustments, but I am not going to merge this as is. I see three changes here, which ideally should be 3 pull requests. The propagation of includes fix looks fine, you can commit the one-line fix to master directly. But I think the barriers should be kept if DPROFILE is on with regular profiling, and removed if DCRITTER is turned on. The barriers also serve a purpose for tuning (note that one of them is surrounded in -DTUNe, also -DTUNE requires -DPROFILE), so that performance of subkernels is not mistimed due to load-balance (not sure which ones are critical for that, so would prefer to leave all of them for now). Its also not clear to me why you are removing the mpi.h header, I think there might have been a reason for it, and am worried it could cause errors if ctf.h included before mpi.h.

huttered40 commented 4 years ago

The propagation of Includes for python builds was already merged into master on May 21st. I'm just going to delete my fork and fork ctf again because I don't know how else to force GitHub to stop showing these.

Your suggestions on the Barrier changes make sense. I will modify the preprocessor check to keep the barriers if CRITTER is not defined and PROFILE is defined. Future changes for critter start/stop will require one to use both CRITTER and PROFILE if one wants critical path call profiling (critter_mode=2) and just CRITTER if one was the baseline critical path statistics (critter_mode=1). Defining just PROFILE will act as it always has and include the barriers.

I removed the mpi.h inclusion in common.h because the very next line includes a file that branches on whether or not to include critter.h or mpi.h depending on the DEFS. Usually I prefer to have a single mpi.h inclusion rather than multiple of them floating around the source code, especially with the way critter intercepts MPI calls. It just makes me nervous, but if you think this could cause issues, I'll forget about it.

solomonik commented 4 years ago

@huttered40 I see, one possible issue regarding mpi.h stuff. Does the current -DCRITTER choice in src/shared/model.h necessitate that applications that include ctf.h also use -DCRITTER when building/linking? Tests build via CTF would do so automatically, but external applications may not unless -DCRITTER is a standard flag which they also use (which might make sense). But in principle someone could build CTF with -DCRITTER and then use it without the flag. In that case, I guess the MPI calls in some of the files in src/interface/... would not get recorded by critter, but its a minority and things would at least still work. Should probably assume this didn't happen, but thought I should mention.

huttered40 commented 4 years ago

I'm not sure I completely follow. Let me break it down into two cases, with CTF_Connectivity as an example of the higher-level library. I'll denote CTF as A and CTF_Connectivity as B.

1) -DCRITTER is given to A's configure but not to B - critter will intercept the routines called from within A, but as critter::start()/critter::stop() are not called from inside B, critter would just call the PMPI routines inside A's library directly.

2) -DCRITTER is given to B's configure but not to A - I actually did this by accident this morning. A's library has no knowledge of critter, and all MPI communication proceeds as usual. All communication called directly from inside B is intercepted by critter subject to the arguments based to critter::start(),critter::stop().

solomonik commented 4 years ago

@huttered40 what I was mentioning is just that for (1) if the MPI call is inside a header file of A, it will not be intercepted if the header file includes an ifdef CRITTER.

huttered40 commented 4 years ago

Abandoning and replacing as instructed ^.