AcademySoftwareFoundation / Imath

Imath is a C++ and python library of 2D and 3D vector, matrix, and math operations for computer graphics
https://imath.readthedocs.io
BSD 3-Clause "New" or "Revised" License
379 stars 115 forks source link

Undefined behavior in Vec::operator[] #446

Open cary-ilm opened 1 week ago

cary-ilm commented 1 week ago

The Vec2, Vec3, and Vec4 operator[](int) relies on undefined behavior, since indexing outside of the &x isn't legal.

template <class T>
IMATH_CONSTEXPR14 IMATH_HOSTDEVICE inline T&
Vec2<T>::operator[] (int i) IMATH_NOEXCEPT
{
    return (&x)[i]; 
}

See https://github.com/AcademySoftwareFoundation/OpenShadingLanguage/pull/1865 for a real-world failure case and investigation of the cause, involving the Intel oneAPI compiler.

The C++ specification doesn't guarantee that the x and y fields are adjacent without padding.

One alternative might be to add #pragma packed or __attribute__(packed) to the class declaration to ensure there is no padding, but padding may not be the issue, since the problem may involve compiler optimizations with temporary objects.

Replacing the &x with reinterpret_cast<T*> may be slightly preferable but is still non-standard and may still not resolve the issue.

Is the only truly valid way of implementing the indexing operation to use a switch statement? That would incur a performance penalty. But this is not a method that should be called in performance-critical situations anyway, much better to use the explicit member references. If operator[] is doomed to be a less-than-optimal method, would it be acceptable to make it even slightly slower but guaranteed correct?

Suggestions?

meshula commented 1 week ago

Pragmas and attributes don't protect against issues like optimizations or undefined behavior in certain compilers. Pragmas or attributes may not help with alignment-related performance on some architectures either.

A union would ensure contiguity and fix UB related to packing with no performance penalty, but doesn't address OOB.

union {
    struct { T x, y; };
    T data[2];
};
lgritz commented 1 week ago

@meshula That also breaks modern type punning rules, but so far it has worked on every compiler.

lgritz commented 1 week ago

The union approach is the one I tend to use (caveats notwithstanding).

@AlexMWells has a favourite idiom for dealing with subscripted access that is truly C++ compliant and generates great code, but is complicated and a bit daunting to a naive reader to read and make sense of.

meshula commented 6 days ago

The reason I suggested the union is that it doesn't introduce branching for basic access.

Is subscripting with no array access an option for Imath 4? Since EXRCore is C, it doesn't use Imath at all, so it's within the realm of possibility to simply leave a problematic pattern behind.

AlexMWells commented 6 days ago

About the union approach. I don't think the solution to C++ undefined behavior is to rely on another C++ undefined behavior even if it "appears" to work for most compilers. https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#c183-dont-use-a-union-for-type-punning

A nice summary of situation here: https://tttapa.github.io/Pages/Programming/Cpp/Practices/type-punning.html

I personally have had to track down issues with union approaches where mixing of .x, .y, .z and [i] access via a union caused undefined behavior.

meshula commented 5 days ago

Yeah, so I come back to proposing we omit the bracket operators completely. That would have been impossible before because OpenEXR relied on the indexing, but OpenEXRCore doesn't use the Imath definitions at all.

AlexMWells commented 5 days ago

I've done some prototyping to explore implementing dynamic index array subscript support and created a godbolt project ontop of Imath that has a wrapper class Vec3Subscript where enum class Subscript { UndefinedBehavior, None, Selection, Dynamic, TempArray, CopyToArray }; Vec3_Subscript intent is to explore different array subscript options. It is not being proposed this templated subscript behavior be added to underlying Vec3. Instead we can use this to easily explore different implementations options

https://godbolt.org/z/qEPo4a1c8

When vectorizing kernels, the Subscript::Selection generates vastly better code as Scalar Replacement Of Aggregates (SROA) succeeds allow the local variable to be kept in registers (which matters for GPU's).

There also is some examples in there on statically unrolling loops to implement generic algorithms that directly access .x, .y. ,z data members vs compile time known std::integral_constant<int, 0>, std::integral_constant<int, 1>, std::integral_constant<int, 2>