TimonKnigge / TCR

Team Code Reference for programming contests
24 stars 5 forks source link

Simplex is slightly broken #3

Closed simonlindholm closed 7 years ago

simonlindholm commented 8 years ago

You might want to apply the fix in https://github.com/jaehyunp/stanfordacm/pull/1/ (and https://github.com/jaehyunp/stanfordacm/pull/3, for that sake).

There was also some fix https://github.com/jaehyunp/stanfordacm/commit/71f999b7753f45ae5d00e89ce77e202f297f0d64 that later got reverted, I don't know what's up with that.

(I wouldn't be surprised if the simplex still got stuck in loops because of broken loop detection and precision stuff, but so far I haven't been hit by it when using Stanford's simplex.)

RagnarGrootKoerkamp commented 7 years ago

Just ran into this issue myself when (finally, after a year..) solving roadtimes from WF 2016. Thanks for the fix!

The division hoisting might not be necessary anymore in the near future, clang is going to do this for us: https://reviews.llvm.org/rL299911 (may need the clang equivalent of -ffast-math or so though).

IMO the division optimization is slightly neater by just setting D[r][s] = 1 / D[r][s] before the loops and then multiplying it.

simonlindholm commented 7 years ago

The division hoisting might not be necessary anymore in the near future, clang is going to do this for us: https://reviews.llvm.org/rL299911 (may need the clang equivalent of -ffast-math or so though).

Nice! ICPC doesn't run clang, though, so that might not be directly applicable, and it's not clear to me that D[r][s] will actually be treated as hoistable.

IMO the division optimization is slightly neater by just setting D[r][s] = 1 / D[r][s] before the loops and then multiplying it.

Neater indeed. I think I avoided that again due to fears of hoistability, e.g. ruining GCC's ability to auto-vectorize the innermost loop. Our local version of the pivot method, which we somehow didn't get around to upstream, looks as follows:

void pivot(int r, int s) {
    T *a = D[r].data(), inv = 1 / a[s];
    rep(i,0,m+2) if (i != r && abs(D[i][s]) > eps) {
        T *b = D[i].data(), inv2 = b[s] * inv;
        rep(j,0,n+2) b[j] -= a[j] * inv2;
        b[s] = a[s] * inv2;
    }
    rep(j,0,n+2) if (j != s) D[r][j] *= inv;
    rep(i,0,m+2) if (i != r) D[i][s] *= -inv;
    D[r][s] = inv;
    swap(B[r], N[s]);
}

and with a #pragma GCC optimize ("Ofast") directive at the top of the file wins it another factor 1.7 or so in the first use of simplex I could grep up. (and the && abs(D[i][s]) > eps wins a lot more in the same case.)