Closed GoogleCodeExporter closed 9 years ago
Alternatively, we could simply make 7.4.2.10 and 7.4.2.11 apply to the existing
"blocking" routines as well.
Original comment by sdvor...@cray.com
on 8 Oct 2012 at 10:28
"My objection is merely that there are no equivalent statements to 7.4.2.10 and
7.4.2.11 for the "blocking" library routines...Alternatively, we could simply
make 7.4.2.10 and 7.4.2.11 apply to the existing "blocking" routines as well."
I don't object to adding some clarifying paragraphs to B.3.2.1, however I think
it's important that these properties are directly stated in the nb library
section. In the blocking case, there are by definition no conflicting accesses
from the initiating thread, which automatically eliminates the easiest way for
a programmer to "mess up". Programmers familiar with shared-memory programming
already understand that synchronization is required when multiple threads touch
the same data, so the data races that can arise when using the blocking library
should be less surprising. Non-blocking transfers introduce new ways you can
create a subtle data race and end up with indeterminate values, so I think it
makes sense to be very clear about when that occurs.
Also as Jim rightly pointed out it's worth clarifying that concurrent reads of
source memory are permitted, since MPI's NB transfers notably prohibit that.
Together these paragraphs neatly summarize the conditions under which
conflicting operations are permitted and when they lead to indeterminate
values. This provides all the information needed by the average user of this
library, who will not need to consult the memory model and puzzle out the
implications to decide if his program is correct.
Original comment by danbonachea
on 8 Oct 2012 at 11:49
I conferred with Pavan and he confirms my recollection that the MPI standard
changed w.r.t. ISEND buffer reads. In MPI-1 and MPI-2.0, the user was not
allowed to touch the ISEND buffer before the request was completed. However,
because many users violated this prohibition and no implementation changed the
send buffer before the request was completed, so MPI-2.1 and later standards no
longer have this prohibition, hence it is now a restriction on the
implementation that it not modify the ISEND buffer before the request is
completed.
Jim is, of course, still correct about users expectations based upon MPI-1,
which is obviously the one that is most widely known.
Original comment by jeff.science@gmail.com
on 9 Oct 2012 at 2:35
Responding to comment 99, from Dan:
--quote--
"Pg. 6, #4: Suggest s/shall/must/ to strengthen this statement."
This sentence appears very early in the semantic descriptions while definitions
are still being established.
I intentionally prefaced the sentence with "Generally" and did not use "shall",
because it's not a binding restriction - specifically in the case when the
explicit-handle initiation returns UPC_COMPLETE_HANDLE, the operation is
already complete and no sync call is required. However this is an unusual
corner case and I wanted to provide a conceptual overview paragraph to
familiarize the reader with the broad form of the interface, unclouded by such
corner-cases, before getting into the actual nitty-gritty of requirements.
--quote--
I think we might not have lined up on the text to which I was referring. I was
looking at 7.4.3 #4 and I think you were looking at 7.4.2 #4. Shall is proper
legalese in 7.4.3 #4, you should ignore my suggestion. 7.4.2 #4 looks fine.
Original comment by james.di...@gmail.com
on 9 Oct 2012 at 3:45
"I don't object to adding some clarifying paragraphs to B.3.2.1, however I
think it's important that these properties are directly stated in the nb
library section."
I just want it to be clear that the blocking and non-blocking routines have
exactly the same semantics regarding remote threads touching the buffers during
the transfer interval. Perhaps a footnote could be added to these paragraphs
indicating that this is a direct consequence of the memory model (B.3.2.1) that
also applies to the blocking routines, but is explicitly called out here
because of the split nature of the transfer interval?
"In the blocking case, there are by definition no conflicting accesses from the
initiating thread, which automatically eliminates the easiest way for a
programmer to "mess up"."
This is only true if there is no threading layer (OpenMP, OpenACC, pthreads,
etc) underneath UPC threads. While that is outside the scope of the UPC spec,
it is important to keep in mind as mixing programming models is quite common in
HPC.
Original comment by sdvor...@cray.com
on 9 Oct 2012 at 3:17
"This is only true if there is no threading layer (OpenMP, OpenACC, pthreads,
etc) underneath UPC threads. While that is outside the scope of the UPC spec,
it is important to keep in mind as mixing programming models is quite common in
HPC."
I agree with this. I think it's important to allow UPC threads, which may be
mapped to OS processes, to interoperate nicely with OS threads (e.g., pthreads)
whenever possible. We have several applications using UPC+OpenMP/Pthreads,
which is the most scalable way to use a NUMA multi-core cluster in our
experiments so far.
Original comment by yzh...@lbl.gov
on 9 Oct 2012 at 4:47
I've not heard anyone in HPC talk about OpenMP or OpenACC as compilation
targets, except perhaps from DSLs. However, I think more explicit APIs like
Pthreads and OpenCL are relevant. It may also be prudent to think about
user-level threads, e.g. Qthreads, as possible back-end components for UPC.
Does Kyle Wheeler follow the UPC spec discussion?
Original comment by jeff.science@gmail.com
on 11 Oct 2012 at 1:32
Just added the footnote suggested by Steve in comment 105, as SVN r174:
--- upc-lib-nb-mem-ops.tex (revision 173)
+++ upc-lib-nb-mem-ops.tex (working copy)
@@ -131,7 +131,10 @@
performed by a set of relaxed shared reads and relaxed shared writes of
unspecified size and order, issued at unspecified times anywhere within the transfer
interval by the initiating thread. Conflicting accesses {\em inside} the transfer interval
-have undefined results, as specified in the preceding paragraphs.
+have undefined results, as specified in the preceding paragraphs.~%
+\footnote{The restrictions described in the three preceding paragraphs are a
direct consequence of
+[UPC Language Specifications, Section B.3.2.1], and also apply to the blocking
\memstar functions.
+They are explicitly stated here for clarity.}
Here {\em inside} and {\em outside} are defined by the {\tt Precedes()} program order for
accesses issued by the initiating thread; accesses issued by other threads are considered {\em inside}
unless every possible and valid $<_{strict}$ relationship orders them outside the transfer interval.~%
Original comment by danbonachea
on 18 Oct 2012 at 9:57
FYI, in MPI-3 One-sided communication, non-blocking puts and gets do Not pass
synchronization points.
Quoted from mpi30-report.pdf from the MPI Forum
(www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf)
Page 431, Line 9-14:
"The end of the epoch, or explicit bulk synchronization using
MPI_WIN_FLUSH, MPI_WIN_FLUSH_ALL, MPI_WIN_FLUSH_LOCAL, or
MPI_WIN_FLUSH_LOCAL_ALL, also indicates completion of the RMA operations. How-
ever, users must still wait or test on the request handle to allow the MPI
implementation to clean up any resources associated with these requests; in
such cases the wait operation will complete locally. "
For comparison, MPI_Win_flush_all is roughly is same as upc_fence and
MPI_Rput/MPI_Rget are the counterparts of upc_memput_nb/upc_memget_nb.
Original comment by yzh...@lbl.gov
on 29 Nov 2012 at 5:02
Re: Comment 109
It's worth adding that /all/ one-sided operations in MPI are non-blocking and
all outstanding operations are completed by passive target flush/lock
operations at the target that is synchronized. Request-generating operations
(added in MPI-3) are not an exception, however the user is still required to
clean up the request object that was returned by MPI.
Original comment by james.di...@gmail.com
on 29 Nov 2012 at 5:42
"all outstanding operations are completed by passive target flush/lock
operations" should say "can be completed by passive...". Obviously, they can
also be completed by active target operations.
Original comment by jeff.science@gmail.com
on 29 Nov 2012 at 6:14
Jim and Jeff: thanks for the clarification.
This means that MPI_Put and MPI_Get actually behave like upc_memput_nbi and
upc_memget_nbi, which are non-blocking memcpy operations without explicit
handles.
Original comment by yzh...@lbl.gov
on 29 Nov 2012 at 6:39
This PendingApproval change appeared in the SC12 Draft 3 release.
It was officially ratified at the 11/29 telecon.
Original comment by danbonachea
on 29 Nov 2012 at 8:03
Original issue reported on code.google.com by
yzh...@lbl.gov
on 22 May 2012 at 11:41