Improve structure element access idiom for vectorizing compilers

lgritz commented 4 years ago

I could do this myself if you want, but I don't want Owen and me to trip over each other, so let me just file this as an issue and he can decide when and how to do this.

Currently, our classes look like this:

template <class T> class Vec3
{
    T x, y, z;
    const T& operator [] (int i) const { 
        return (&x)[i];
    }
}

This may seem ok for scalar code, but because the cast to a pointer in order to access the structures is technically UB, in all the compilers we have tested, when you do component access with operator[] within a loop that you hope to autovectorize, the compilers lose track of things in the implied conversion to raw pointer and that can cause the loop to fail to vectorize. This is because you are taking the address &x and then intentionally dereferencing past the end of x's memory.

I propose the following change of idiom:

template <class T> class Vec3
{
    union {
        struct { T x, y, z; };
        T arr_[3];
    };

    const T& operator [] (int i) const { 
        return arr_[i];
    }
}

Note that direct access of .x, .y, .z will continue to work as always. Operator[] also still works. But there is no longer a cast directly from &x to a pointer that is used to access beyond x's extent.

Also, we should document that, just in case, operator[] should only be used if the array index is not known (like if you are looping over elements), but that for single element access where you know which one you want, .x, .y, and .z are preferred since we can make a stronger promise about the ability of the compiler to reason about autovectorization of expressions involving those.

meshula commented 4 years ago

I like it!!

cary-ilm commented 4 years ago

I'm not sure it's going to work, specifically in the case where T has a non-trivial destructor. This doesn't work:

class Foo : public std::vector<float>
{
}

Vec2<Foo> f;

The union can't have members with non-trivial constructor/destructors. At least as proposed, it's going to break backwards compatibility. There may be ways around this, not sure yet.

lgritz commented 4 years ago

Ick.

If this were being designed from scratch today, I would be happy to achieve higher performance for the case everybody needs and cares about -- simple vectors of floats -- even if it meant that this template only worked for vectors of trivially ctrable/dtrable classes.

But it bums me out to potentially break back compatibility.

I don't know how to balance those desires.

Another route, I suppose, is to leave it but have strongly worded documentation that if you care about auto-vectorizing loops involving Vec's, then it's very important to never use operator[] and always write code that directly names .x, .y, or .z. That's still not perfect, but if the advice is followed, it covers many of the use cases. Though I also will not be surprised if future compilers get more and more strict with pedantic warnings about this undefined behavior.

lgritz commented 4 years ago

To clarify, in the preceding message, when I said "another route, I suppose, is to leave it...", I meant "leave it as it is, abandon this suggestion".

cary-ilm commented 4 years ago

We can probably do it with a second template argument with a type that provides the storage, with a default value involving the union, although that might get messy. But that sort of construct is at least familiar from the std classes. We shouldn't rule it out yet.

On Tue, Aug 11, 2020 at 5:37 PM Larry Gritz notifications@github.com wrote:

To clarify, in the preceding message, when I said "another route, I suppose, is to leave it...", I meant "leave it as it is, abandon this suggestion".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/AcademySoftwareFoundation/Imath/issues/26#issuecomment-672397308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFC3DGJPLXUSCTTBL7YVN5LSAHP3BANCNFSM4O4W34NQ .

-- Cary Phillips | R&D Supervisor | ILM | San Francisco

meshula commented 4 years ago

I like Cary's suggestion in principle ~ could go farther and say that this new union'd type is the provided storage type ~ and then treat the existing classes as utility classes. Modern performant variants on critical operations could be provided on the new types, and complex operations like SVD delegated to existing Imath implementation, and thus we could modernize operation and preserve compatibility. When folks like Larry & I implement new code, we could therefore mainly write against the new type. It would be messy in the sense that we'd need to touch ALL the code to reference the storage object, it would be easy mechanical work, but extremely time consuming work.

AcademySoftwareFoundation / Imath

Improve structure element access idiom for vectorizing compilers #26