Closed GoogleCodeExporter closed 9 years ago
My opinion is that this code should not be legal, as I can't think of a case
where this serves a useful purpose.
As for the fix, my suggestion is an explicit statement in section 3.7 that
states implementations may presume all threads enter each collective operation
before any must leave. I think this fixes it for all collectives, not just
upc_notify. I would not object to a statement in upc_lock/upc_unlock that
notes that because of this a lock/unlock pair may not contain a collective
operation.
Any other views on this?
Original comment by w...@uuuuc.us
on 16 Mar 2012 at 10:30
I propose adding to the descriptions of upc_lock and upc_lock_attempt (7.2.4.5
and 7.2.4.6 in UPC 1.2):
7 The calling thread may not enter a collective operation while holding a lock.
I propose modifying the collectives description (7.3 in UPC 1.2):
5 Collective functions may not be called between upc_notify and the
corresponding upc_wait, or while holding a lock.
Original comment by johnson....@gmail.com
on 20 Mar 2012 at 7:54
Some of the comments in issue 12
(http://code.google.com/p/upc-specification/issues/detail?id=12) are relevant
here.
In general I think we need to "fix" this entirely within the definition of
upc_notify/upc_wait and the unspecified synchronization that may occur within
any operation our document specifies as "collective". Relying upon spec
language added to upc_lock and friends just means the same issue arises when
the user arranges for any other kind of inter-thread synchronization around a
collective op (eg using strict operations).
Original comment by danbonachea
on 15 Jun 2012 at 8:46
OK. Same problem as in issue 12. We don't control the people who write things
like non-blocking barriers, and therefore it is wise not to expose UPC to the
vagaries of any particular implementation of barrier.
With my IBM hat on, I would have liked to continue the crusade, and could have
used a stricter UPC language standard as a stick to beat up our PAMI guys. But
I accept that this will not happen everywhere.
Original comment by ga10...@gmail.com
on 15 Jun 2012 at 2:47
Hoping for 1.3. This issue keeps coming up for us.
Original comment by johnson....@gmail.com
on 15 Jun 2012 at 5:59
I believe this case is already covered by the definition of "collective",
although in order to make it clearer I propose the following clarification to
that definition:
--- upc-terms-and-defs.tex (revision 80)
+++ upc-terms-and-defs.tex (working copy)
@@ -87,14 +87,16 @@
\sterm{collective}%
constraint placed on some language
operations which requires evaluation of such operations to be
- matched\footnote{A collective operation need not provide any
+ matched across all threads.%
+ \footnote{A collective operation need not provide any
actual synchronization between threads, unless otherwise noted.
The collective requirement simply states a relative ordering property
of calls to collective operations that must be maintained in the
parallel execution trace for all executions of any legal program.
Some implementations may include unspecified synchronization between
threads within collective operations, but programs must not rely upon
- such unspecified synchronization for correctness.} across all threads.
+ the presence or absence of
+ such unspecified synchronization for correctness.}
The behavior of collective operations is undefined unless all threads
execute the same sequence of collective operations.
upc_notify is elsewhere defined as a "collective" operation, so the program in
comment 0 is invalid because its correctness relies upon "the absence of such
unspecified synchronization" in the upc_notify call, and hence the program may
deadlock on some implementations (and that deadlock is the programmer's fault).
As a closing note, the spec does NOT prohibit holding a lock while executing a
collective operation, because that would be too strong a condition. For
example, this code similar to comment 0 holds a lock during upc_notify:
upc_lock_t *shared_lock = upc_all_lock_alloc();
if (MYTHREAD == 0) { upc_lock( shared_lock ); }
upc_notify;
if (MYTHREAD == 0) { upc_unlock( shared_lock ); }
this code will not deadlock because it does not rely on the absence of
synchronization in upc_notify for correctness - only one thread is acquiring
the lock before entering the collective, and there is no cycle of dependencies
preventing other threads from reaching the collective call.
Original comment by danbonachea
on 13 Aug 2012 at 2:55
Here's a more realistic example of using a lock across a barrier, to
dynamically "elect" a single representative thread who arrives at the barrier
early to do some additional work:
while (1) {
int first_thread = upc_lock_attempt( shared_lock );
if (first_thread) {
// I reached the barrier first, do some serialized work to prepare for the next phase
// eg read in file data for the next iteration
}
upc_notify;
upc_wait;
if (first_thread) { upc_unlock( shared_lock ); }
// some work
upc_barrier;
// some more work
}
Original comment by danbonachea
on 13 Aug 2012 at 3:14
I am okay with the language in Comment #6 (as opposed to Comment #2) based on
the rationale in Comment #3.
Original comment by johnson....@gmail.com
on 13 Aug 2012 at 2:46
Mass-edit to mark all issues with an officially-announced resolution diff as
"PendingApproval"
Original comment by danbonachea
on 17 Aug 2012 at 3:25
May or may not be directly relevant, but during a meeting yesterday I
confronted Pavan Balaji (him of MPI3 fame) with this exact problem - MPI
formulation - and he was fairly sure that MPI3 would survive MPI barrier forced
through a critical section.
IIRC Thorsten Hoefler, and the IBM MPI guys, said the exact opposite :)
Original comment by ga10...@gmail.com
on 17 Aug 2012 at 5:56
Set Consensus:High on Pending Approval issues.
Original comment by gary.funck
on 19 Aug 2012 at 11:31
Change proposed in comment #6 has been applied in SVN r103.
Original comment by danbonachea
on 13 Sep 2012 at 10:25
For the record:
Public comment period for this change was 8/14/2012 - 9/14/2012
No substantial objections were raised or recorded during that period.
At the 9/7/2012 telecon, it was announced this change was imminent, feedback
was explicitly solicited, and none was received.
The change was integrated into a working draft distributed on 9/13/2012 for
consideration and draft ratification at the 9/14/2012 telecon.
Original comment by danbonachea
on 14 Sep 2012 at 7:26
Changes distributed as Draft 1.1 were ratified during the 9/14/2012
teleconference, for inclusion in the next public draft.
Original comment by danbonachea
on 14 Sep 2012 at 9:46
Original issue reported on code.google.com by
johnson....@gmail.com
on 13 Mar 2012 at 4:40