Open Quuxplusone opened 9 years ago
Bugzilla Link | PR25474 |
Status | NEW |
Importance | P normal |
Reported by | hstong@ca.ibm.com |
Reported on | 2015-11-10 09:57:45 -0800 |
Last modified on | 2015-11-10 20:46:44 -0800 |
Version | trunk |
Hardware | All Linux |
CC | dgregor@apple.com, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk, rjmccall@apple.com, rnk@google.com, spatel+llvm@rotateright.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
I think the read of tail padding here is permitted (but I'd like John to
confirm); under our (implementation-defined) model for what constitutes a
volatile access, I believe this is not one.
However:
1) it seems really dumb that we're copying 4096 bytes more than we need to.
2) this causes miscompiles in related cases, such as (for x86_64):
struct A {
~A() {}
void *p;
int n;
};
struct B : A {
int m = 1;
} b;
struct C : A {
C() : A(b) {}
int m;
};
struct D : C {};
D d = D();
int main() { return d.m; }
This is required to result in d.p = nullptr, d.n = 0, d.m = 0, because D()
performs zero-initialization prior to invoking the default constructor of D.
However, we actually initialize d.m to 1, because we memcpy the tail-padding of
the A object when constructing C's base class.
So, we *must not* copy tail padding, at least when emitting a (trivial) base
subobject copy/move constructor. The !tbaa.struct metadata is not sufficient to
avoid the problem, because it is discardable (and indeed is ignored in this
case).
Also of note: the use of pgsz within the alignas causes us to assert (but that
is unrelated to this issue, and looks like bug#13986).
I certainly agree that Richard's example is a miscompile; when constructing a
base-class subobject, we need to not write more bytes than the base-subobject
size. (I'm quite surprised that we have bugs here, actually; I thought I
remembered fixing this. Maybe there was a case I missed, or which I failed to
test adequately and was broken later.)
The original example is interesting. Writing the complete-object size to the
destination is certainly legal, so the remaining question is whether it's legal
to perform a read from the source that might overlap a volatile object in a
subclass. I'm pretty sure it's legal according to the memory model; the read
doesn't interfere with concurrent accesses on any architecture I'm aware of,
and we don't care about it being interfered with by other concurrent accesses
because it's only used to initialize padding. Volatile requires us to perform
volatile accesses exactly as given, but it's arguable that that shouldn't
constrain us from performing additional accesses that aren't observable under
the standard and under our declared rules of implementation-defined volatile
behavior. memprotecting random chunks of the heap is well outside of the
standard, and I agree that our implementation-defined behavior doesn't have to
honor memory-mapping tricks internal to an object.
Normally we're very conservative about optimizing around explicit uses of
volatile, but that's not the case here; here we're being asked to be more
conservative purely because there might be an unknown use of volatile.
And this is an important optimization. Well, no, in this example it's clearly
a pessimization, of course, but that's just an artifact of the enormous
alignment here; imagine e.g. copying a structure like this:
struct NonPOD { ~NonPOD(); void *ptr; char array[sizeof(void*) - 1]; };
Having to copy exactly 7 (or 15) bytes instead of 8 (or 16) just because the
source might be a base subobject of a class that begins with a volatile char
would be pretty awful.
All that said, I would be happy to help come up with a rule that tried to
simultaneously address the pessimization and the mis-compile for pages internal
to a structure. For example, we could copy
min(<complete-object size>, roundUpToAlignment(<base-subobject size>, min(<base-subobject alignment>, max(<target max vector size>, <target pointer size>)))
This would generally not roll over into new pages allocated in subclasses, and
it would very strictly bound the number of extra bytes the copy would perform.