Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Failed access analysis and missing vectorization of loop containing std::min/std::max #39038

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR40066
Status NEW
Importance P enhancement
Reported by Moritz Kreutzer (moritz.kreutzer@siemens.com)
Reported on 2018-12-18 02:14:18 -0800
Last modified on 2018-12-20 04:38:59 -0800
Version trunk
Hardware PC Linux
CC hfinkel@anl.gov, hideki.saito@intel.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Hi,

Clang trunk fails to vectorize the following loop with "cannot identify array
bounds":

    for (int i = 0; i < end; ++i) // does not vectorize
    {
        dst[i] = std::min(dst[i], dst[i]/src[i]);
    }

If we don't use std::min, but a simple ternary operator, Clang is able to
vectorize the resulting loop:

    for (int i = 0; i < end; ++i) // vectorizes
    {
        dst[i] = dst[i] < dst[i]/src[i] ? dst[i] : dst[i]/src[i];
    }

The problem does not seem to be std::min in itself, because Clang is able to
vectorize the following loop (note the different logic and the absence of an
implicit temporary for dst[i]/src[i]):

    for (int i = 0; i < end; ++i) // vectorizes
    {
        dst[i] = std::min(dst[i], src[i]);
    }

If we replace std::min with a custom implementation, returning the result by
value instead of "const &" (as it is done in libstdc++), we can also get the
first version to vectorize.

Obligatory godbolt: https://godbolt.org/z/6xjhz-

All in all, it seems like mixing references and values is the problem here. I
have tried patching D23686, D52116, and D52117, to no avail. From my feeling,
the first loop is something which should be vectorizable, even with the default
implementation of std::min (return a const reference ) in libstdc++ (in the
end, the call should be inlined and the compiler should be able to see through
everything). But maybe I'm missing something obvious, so any help is greatly
appreciated.

Thanks in advance,
Moritz
Quuxplusone commented 5 years ago
I investigated a bit further regarding the vectorization of

    dst[i] = std::min(dst[i], dst[i]/src[i])

Remember that std::min() is implemented along the lines of:

    double const & std::min(double const & a, double const & b)
    {
        if (a<b) return a;
        return b;
    }

The debug output indicates that the access analysis fails for a bitcast:

    LAA: Can't find bounds for ptr:  %2 = bitcast double* %a.b.i to i64*, !dbg !23

Where %a.b.i seems to hold the result of the call to std::min():

    %div = fdiv double %0, %1
    store double %div, double* %ref.tmp, align 8, !tbaa !2, !llvm.access.group !6
    %cmp.i = fcmp olt double %0, %div
    %a.b.i = select i1 %cmp.i, double* %arrayidx, double* %ref.tmp
    %2 = bitcast double* %a.b.i to i64*

The exact place of failure on LoopAccessAnalysis is in hasComputableBounds(),
where the dynamic_cast of the pointer to SCEVAddRecExpr fails and Assume ==
false:

    const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(PtrScev);
    if (!AR && Assume) AR = PSE.getAsAddRec(Ptr);
    if (!AR)  return false;

While I can imagine that the present semantics (mixture of temporary values and
referencse) are not trivial for access analysis, I doubt that it's impossible
to safely vectorize this loop. As I said before, merely changing the return
type of std::min() to "double" enables vectorization for this loop, probably
because we are creating a copy at one point.
I'm not exactly sure what's happening here. I just wanted to share my thoughts
and hope that it helps or someone can point me in the right direction. I can
provide more information if required.

Moritz