Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Optimizer fails to unroll non-natural loop #33265

Open Quuxplusone opened 7 years ago

Quuxplusone commented 7 years ago
Bugzilla Link PR34293
Status NEW
Importance P enhancement
Reported by Peet (peetlugen19@yandex.com)
Reported on 2017-08-22 18:04:21 -0700
Last modified on 2017-12-21 15:54:18 -0800
Version trunk
Hardware PC Linux
CC brycelelbach@gmail.com, chandlerc@gmail.com, gornishanov@gmail.com, hfinkel@anl.gov, junryoungju@gmail.com, llvm-bugs@lists.llvm.org, michael.v.zolotukhin@gmail.com, mkuper@google.com
Fixed by commit(s)
Attachments testcase.cpp (1454 bytes, text/x-c++src)
Blocks
Blocked by
See also
Created attachment 19033
testcase

Test case: https://godbolt.org/g/esjnb2

I've also attached the test-case as an attachment.

LLVM-IR:

define void @printEquivalent()() local_unnamed_addr #0 !dbg !684 {
  %1 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32 1), !dbg !685
  %2 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32 3), !dbg !686
  ret void, !dbg !687
}

declare i32 @printf(i8* nocapture readonly, ...) local_unnamed_addr #1

define i32 @main() local_unnamed_addr #2 personality i8* bitcast (i32 (...)*
@__gxx_personality_v0 to i8*) !dbg !688 {
  br label %1, !dbg !693

; <label>:1:                                      ; preds = %0, %5
  %2 = phi i2 [ 0, %0 ], [ %7, %5 ]
  switch i2 %2, label %4 [
    i2 0, label %5
    i2 1, label %3
    i2 -2, label %9
  ]

; <label>:3:                                      ; preds = %1
  br label %5, !dbg !826

; <label>:4:                                      ; preds = %1
  unreachable, !dbg !824

; <label>:5:                                      ; preds = %1, %3
  %6 = phi i32 [ 3, %3 ], [ 1, %1 ]
  %7 = phi i2 [ -2, %3 ], [ 1, %1 ]
  %8 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32 %6), !dbg !828
  br label %1, !dbg !693, !llvm.loop !830

; <label>:9:                                      ; preds = %1
  ret i32 0, !dbg !832
}

Issue:

The optimizer could have optimized the couroutine into something more like the
two sequential calls to printf. Instead, it isn't able to unroll the
"couroutine-loop".
Quuxplusone commented 7 years ago

Attached testcase.cpp (1454 bytes, text/x-c++src): testcase

Quuxplusone commented 7 years ago

Ping.

Quuxplusone commented 7 years ago

Ping.

Quuxplusone commented 7 years ago

Ping.

Quuxplusone commented 7 years ago

Ping.

Quuxplusone commented 7 years ago
Thank you for pinging.
Unless we find someone willing (and having knowledge) to address the issue, I
am not sure this will get fixed soon.
Quuxplusone commented 7 years ago
Thanks for acknowledging this issue. I've a little reproducer which doesn't
depend on the couroutines feature:

https://godbolt.org/g/nTkkaj

Code:

extern void printf(const char*);

void test() {
    auto step = 0;
    while(true) {
        switch (step) {
            case 0:
                step = 1;
                break;
            case 1:
                printf("hello");
                step = 2;
                break;
            case 2:
                printf("hello");
                step = 3;
                break;
            case 3:
                return;
            default:
                __builtin_unreachable();
        }
    }
}

Surprisingly GCC is able to simplify this to two sequential calls to printf.
However, I am lacking the skills of the internals of LLVM, so I would be
grateful if anyone else could take a look at this :)
Quuxplusone commented 7 years ago
(In reply to Peet from comment #6)
> Thanks for acknowledging this issue. I've a little reproducer which doesn't
> depend on the couroutines feature:
>
> https://godbolt.org/g/nTkkaj
>
> Code:
>
> extern void printf(const char*);
>
> void test() {
>     auto step = 0;
>     while(true) {
>         switch (step) {
>             case 0:
>                 step = 1;
>                 break;
>             case 1:
>                 printf("hello");
>                 step = 2;
>                 break;
>             case 2:
>                 printf("hello");
>                 step = 3;
>                 break;
>             case 3:
>                 return;
>             default:
>                 __builtin_unreachable();
>         }
>     }
> }
>
> Surprisingly GCC is able to simplify this to two sequential calls to printf.
> However, I am lacking the skills of the internals of LLVM, so I would be
> grateful if anyone else could take a look at this :)

FWIW, if you try to force LLVM to unroll the loop by adding:

  #pragma clang loop unroll(enable)

above it, it still won't. The reason given is, "Can't unroll; loop not
terminated by a conditional branch."

I agree that we should do what GCC does and fully unroll this loop. I'm
guessing that we need to symbolically evaluate some of it for a few iterations
in order to see that unrolling is helpful.
Quuxplusone commented 7 years ago

@Hal Finkel May I ask how would the symbolically executing part work? Does llvm have such "infrastructure"/code?

Quuxplusone commented 7 years ago
(In reply to Peet from comment #8)
> @Hal Finkel May I ask how would the symbolically executing part work? Does
> llvm have such "infrastructure"/code?

Yes, we already have code that does this in analyzeLoopUnrollCost in
lib/Transforms/Scalar/LoopUnrollPass.cpp. I suspect that it just does not
handle this case.
Quuxplusone commented 7 years ago
> Yes, we already have code that does this in analyzeLoopUnrollCost in
lib/Transforms/Scalar/LoopUnrollPass.cpp. I suspect that it just does not
handle this case.
Most probably, it bails out because the trip count of the loop is unknown, but
it might be interesting to use it to evaluate tripcount for loops like this.
Quuxplusone commented 7 years ago

Ping. Has there been any progress about fixing this?

Quuxplusone commented 7 years ago

This might be important for the general performance of coroutines.

Quuxplusone commented 6 years ago

Weekly ping :)

Quuxplusone commented 6 years ago

Weekly ping :)

Quuxplusone commented 6 years ago

Weekly ping :)

Quuxplusone commented 6 years ago

I'm trying to fix it.

Quuxplusone commented 6 years ago
I don't understand why this loop has multiple exits.

Printing analysis 'Scalar Evolution Analysis' for function '?test@@YAXXZ':
Classifying expressions for: @"\01?test@@YAXXZ"
  %step.0 = phi i2 [ 0, %entry ], [ %step.0.be, %while.cond.backedge ]
  -->  %step.0 U: full-set S: full-set          Exits: <<Unknown>>              LoopDispositions: { %while.cond: Variant }
  %step.0.be = phi i2 [ -1, %sw.bb2 ], [ -2, %sw.bb1 ], [ 1, %while.cond ]
  -->  %step.0.be U: full-set S: full-set               Exits: <<Unknown>>              LoopDispositions: { %while.cond: Variant }
Determining loop execution counts for: @"\01?test@@YAXXZ"
Loop %while.cond: <multiple exits> Unpredictable backedge-taken count.
Loop %while.cond: Unpredictable max backedge-taken count.
Loop %while.cond: Unpredictable predicated backedge-taken count.
Printing analysis 'Unroll loops':
Pass::print not implemented for pass: 'Unroll loops'!

I think something wrong with analyzing exits.
while(true) isn't a exit actually.

let me debug for a while.
Quuxplusone commented 6 years ago

I think there is no general case for create AddExpr for "switch" instruction.

Quuxplusone commented 6 years ago
same problem with ICmp instructions.
LoopUnroll should support for conditional instructions to assume that loop can
exit.
Quuxplusone commented 6 years ago

@Jun Ryung Ju First of all, thank you for taking a look at this :) Secondly, have you made any progress adding unrolling for non-natural loops?

Quuxplusone commented 6 years ago

Ping.

Quuxplusone commented 6 years ago
Not yet, in progress.
I think this isn't only for a SCEV, even its predicatable, it unrolls without
any conditional varaibles. that means it works wrong.

this will be a big changes in LoopUnroll and SCEV
Quuxplusone commented 6 years ago
I can't work very fast D:
my computer has broken, I only have my Surface Pro.
=I need to compile codes without any compile acceleration, so it will take some
times to fix it.
Quuxplusone commented 6 years ago
I will fix Unroll Loop part after fixing SCEV to support any conditional-
expressions.
that currently in progress.

http://lists.llvm.org/pipermail/llvm-dev/2017-November/119179.html
Quuxplusone commented 6 years ago

@Jun Ryung Ju I am wondering if you've made any progress so far?

Quuxplusone commented 6 years ago

Gentle ping :)

Quuxplusone commented 6 years ago

I have lots of works to do right now. maybe I can't write codes in time that you want. I am still progress in a writing partical-range based conditional-expression analyzer.

Quuxplusone commented 6 years ago
maybe we can just simply fixing this switch case into a increase expression.
and also, that you write code __builtin_unreachable() will cause another
problem that it will generate multiple exit counts which that will makes SCEVs
not to analyze trip-count.

1.just by fixing set-expression as increase-expression.
2.remove __builtin_unreachable();
3.create an AddRecExpr with switch instruction with a increase expression.

hope this will helped.