Open GoogleCodeExporter opened 9 years ago
As an implementer I loath the idea of, for instance, making access to 64-bit
"double" atomic on a 32-bit platform. I also agree, however, that the C signal
handling idea that ONLY one specific type is atomic is pretty much useless for
any concurrent programming, including not just UPC but also pthreads, etc.
So, I am fine with the proposal *IF* the first bullet is changed from
- a normal scalar access must resolve to a single memory operation.
to
- a scalar access up to an implementation-defined size and with implementation-defined alignment must resolve to a single memory operation.
Note 1: "implementation-defined" means the implementation is required to
DOCUMENT the size and alignment restrictions
Note 2: there are "broken" ABIs, such as for PPC64 on AIX, where the CPU word
size is 64-bits, but 64-bit "double" and "long long" is given only 4-byte
alignment! This is a platform where the "implementation-defined alignment"
would be used to state what might otherwise not be obvious to the user.
Original comment by phhargr...@lbl.gov
on 17 Jul 2012 at 7:36
Set default Consensus to "Low".
Original comment by gary.funck
on 19 Aug 2012 at 11:26
Change Status to New: Requires review.
Original comment by gary.funck
on 19 Aug 2012 at 11:37
I will retain ownership of this issue.
Original comment by gary.funck
on 19 Sep 2012 at 5:04
Note that bit-fields are technically scalars (like all integer types), so it'd
be nice to qualify that a bit more. I don't know that it's reasonable to
require bit-field updates to be tear-free.
Original comment by sdvor...@cray.com
on 21 Sep 2012 at 7:59
"The problem case is shared scalars whose size is greater than what the
underlying hardware supports."
The problem is actually worse than stated in comment 0. There are also
architectures that can data tear in the opposite direction. Specifically, when
performing a write of size SMALLER than the hardware word size, they do a
read-modify-write of a larger size (word or even cache line) and the writeback
can therefore clobber concurrent writes to the word data surrounding the small
write performed at the language level. This affects bitfield writes on almost
every achitecture, but can also affect byte writes on certain systems. Most
architectures include a byte mask in the writeback so the memory controller
only writes the actual dirty bytes, but I'm not sure we should assume that's
universally available.
Because of these competing tensions, some architectures may only support
atomic, tear-free writes of only a single data size, and only when aligned.
This is why C99 only requires implementations to provide tear-free updates of a
single type (sig_atomic_t sec 7.14). UPC technically inherits sig_atomic_t, but
C99 explicitly allows this type to be volatile-qualified (read "completely
unoptimized"). Also there is no guarantee on the range of values this type can
hold (read "portability problem"), and in any case it's definitely an integer
type, which rules out floating-point values. Overall, this is probably not a
type we should be teaching HPC users to use for their main data structures.
I agree with Paul that we should not provide attempt to provide a universal
guarantee of tear-free memory operations - such a guarantee could make UPC
unimplementable on many architectures of interest. I think the best we can
universally require is a single "implementation-defined" type that will be
tear-free - but this basically bring us back to sig_atomic_t, which is already
available.
Overall I prefer the model of encouraging users to write programs that are
properly synchronized (without data races that can expose tearing).
Alternatively if they insist upon including data races in their program, then
encourage them use the AMO interface, where the effects of tearing can be
prevented by handling concurrent accesses in a principled manner within the
library. This seems far preferable to specifying something about all concurrent
accesses anywhere in the program (even of a certain size), which seems likely
to imply new implementation headaches, subtle implementation bugs, and possibly
global negative performance impacts. I suspect the standardization and wide
availability of an AMO library will help to reduce the importance of this issue
for many users.
I move that we postpone this issue to 1.4 or later, and re-consider the issue
once the standardized AMO library reaches widespread acceptance.
Original comment by danbonachea
on 25 Sep 2012 at 12:08
deferred to 1.4 at the 11/29 telecon
Original comment by danbonachea
on 29 Nov 2012 at 7:35
Original issue reported on code.google.com by
gary.funck
on 17 Jul 2012 at 5:52