Open cary-ilm opened 1 week ago
Pragmas and attributes don't protect against issues like optimizations or undefined behavior in certain compilers. Pragmas or attributes may not help with alignment-related performance on some architectures either.
A union would ensure contiguity and fix UB related to packing with no performance penalty, but doesn't address OOB.
union {
struct { T x, y; };
T data[2];
};
@meshula That also breaks modern type punning rules, but so far it has worked on every compiler.
The union approach is the one I tend to use (caveats notwithstanding).
@AlexMWells has a favourite idiom for dealing with subscripted access that is truly C++ compliant and generates great code, but is complicated and a bit daunting to a naive reader to read and make sense of.
The reason I suggested the union is that it doesn't introduce branching for basic access.
Is subscripting with no array access an option for Imath 4? Since EXRCore is C, it doesn't use Imath at all, so it's within the realm of possibility to simply leave a problematic pattern behind.
About the union approach. I don't think the solution to C++ undefined behavior is to rely on another C++ undefined behavior even if it "appears" to work for most compilers. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c183-dont-use-a-union-for-type-punning
A nice summary of situation here: https://tttapa.github.io/Pages/Programming/Cpp/Practices/type-punning.html
I personally have had to track down issues with union approaches where mixing of .x, .y, .z and [i] access via a union caused undefined behavior.
Yeah, so I come back to proposing we omit the bracket operators completely. That would have been impossible before because OpenEXR relied on the indexing, but OpenEXRCore doesn't use the Imath definitions at all.
I've done some prototyping to explore implementing dynamic index array subscript support and created a godbolt project ontop of Imath that has a wrapper class Vec3Subscript
https://godbolt.org/z/qEPo4a1c8
When vectorizing kernels, the Subscript::Selection generates vastly better code as Scalar Replacement Of Aggregates (SROA) succeeds allow the local variable to be kept in registers (which matters for GPU's).
There also is some examples in there on statically unrolling loops to implement generic algorithms that directly access .x, .y. ,z data members vs compile time known std::integral_constant<int, 0>, std::integral_constant<int, 1>, std::integral_constant<int, 2>
The Vec2, Vec3, and Vec4
operator[](int)
relies on undefined behavior, since indexing outside of the&x
isn't legal.See https://github.com/AcademySoftwareFoundation/OpenShadingLanguage/pull/1865 for a real-world failure case and investigation of the cause, involving the Intel oneAPI compiler.
The C++ specification doesn't guarantee that the
x
andy
fields are adjacent without padding.One alternative might be to add
#pragma packed
or__attribute__(packed)
to the class declaration to ensure there is no padding, but padding may not be the issue, since the problem may involve compiler optimizations with temporary objects.Replacing the
&x
withreinterpret_cast<T*>
may be slightly preferable but is still non-standard and may still not resolve the issue.Is the only truly valid way of implementing the indexing operation to use a switch statement? That would incur a performance penalty. But this is not a method that should be called in performance-critical situations anyway, much better to use the explicit member references. If
operator[]
is doomed to be a less-than-optimal method, would it be acceptable to make it even slightly slower but guaranteed correct?Suggestions?