dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.65k forks source link

Loop cloning can't eliminate bounds check when assigning to multi-dim jagged array #54074

Open BruceForstall opened 3 years ago

BruceForstall commented 3 years ago

The core of the src\tests\JIT\Performance\CodeQuality\Benchstones\BenchI\Array2\Array2.cs benchmark is this loop:

for (; loop != 0; loop--) {
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 10; j++) {
            for (int k = 0; k < 10; k++) {
                d[i][j][k] = s[i][j][k];
            }
        }
    }
}

Some observations:

  1. The inner three loops are all loop invariant w.r.t. the outer loop. Can't we just eliminate the outer loop?
  2. The outer loop is not cloned because it is not in the canonical form accepted by loop cloning -- "for (var=init; var < limit; var++". The i and j loops are cloned. The inner k loop is not cloned. Why?
  3. Without cloning, there are 5 bounds checks in the inner loop; 1 bounds check get hoisted. With cloning, there are 2 bounds checks in the inner loop; the rest become cloned loop pre-conditions.
    • If the inner two loops are also cloned, can we eliminate all the bounds checks?
    • The 1st and 2nd cloned loops appear to have duplicate cloning conditions (i.e., it appears they check the same array bounds multiple times
    • In the non-cloned case, why can't we hoist more checks?

category:cq theme:loop-opt skill-level:expert cost:medium impact:medium

BruceForstall commented 3 years ago

The inner k loop is not cloned due to a heuristic to avoid creating too many "checking" blocks in Compiler::optComputeDerefConditions.

After bumping that heuristic to allow the inner loop to be cloned, we still don't remove one of the bounds checks.

BruceForstall commented 3 years ago

Why we can't eliminate the final bounds check in the inner loop: when the importer imports the d[i][j][k] = s[i][j][k] line, it needs to preserve proper exception ordering. That means doing the d[i][j] array bounds check first, then s[i][j][k], then finally the d[i][j][k] bounds check before the assign. To do this, it introduces a temp to store d[i][j], and reuses that temp at the assign. This temp is marked both "Strict ordering of exceptions for Array store" and a "single def temp". When loop cloning goes to reconstruct the array index expressions, it sees the temp, which doesn't appear to be loop-invariant, because it is assigned in the loop. Also, its assignment is in another tree. Thus, loop cloning won't touch the "store" array bounds check, even though it really could be added to the cloning condition if we were somehow smarter, such as knowing it were single-def, and finding that def, looking for the array index pattern there.

BruceForstall commented 3 years ago

All the issues have been investigated. While more improvements could be implemented, there are none planned, so I'm closing this.

BruceForstall commented 2 years ago

The issue with an extra bounds check for nested array access that loop cloning can't get rid of should be investigated. It happens also in other test cases with multi-dimensional (jagged) arrays and nested loops over the array indices.