Closed simonlindholm closed 7 years ago
Just ran into this issue myself when (finally, after a year..) solving roadtimes from WF 2016. Thanks for the fix!
The division hoisting might not be necessary anymore in the near future, clang is going to do this for us: https://reviews.llvm.org/rL299911 (may need the clang equivalent of -ffast-math
or so though).
IMO the division optimization is slightly neater by just setting D[r][s] = 1 / D[r][s]
before the loops and then multiplying it.
The division hoisting might not be necessary anymore in the near future, clang is going to do this for us: https://reviews.llvm.org/rL299911 (may need the clang equivalent of -ffast-math or so though).
Nice! ICPC doesn't run clang, though, so that might not be directly applicable, and it's not clear to me that D[r][s]
will actually be treated as hoistable.
IMO the division optimization is slightly neater by just setting D[r][s] = 1 / D[r][s] before the loops and then multiplying it.
Neater indeed. I think I avoided that again due to fears of hoistability, e.g. ruining GCC's ability to auto-vectorize the innermost loop. Our local version of the pivot
method, which we somehow didn't get around to upstream, looks as follows:
void pivot(int r, int s) {
T *a = D[r].data(), inv = 1 / a[s];
rep(i,0,m+2) if (i != r && abs(D[i][s]) > eps) {
T *b = D[i].data(), inv2 = b[s] * inv;
rep(j,0,n+2) b[j] -= a[j] * inv2;
b[s] = a[s] * inv2;
}
rep(j,0,n+2) if (j != s) D[r][j] *= inv;
rep(i,0,m+2) if (i != r) D[i][s] *= -inv;
D[r][s] = inv;
swap(B[r], N[s]);
}
and with a #pragma GCC optimize ("Ofast")
directive at the top of the file wins it another factor 1.7 or so in the first use of simplex I could grep up. (and the && abs(D[i][s]) > eps
wins a lot more in the same case.)
You might want to apply the fix in https://github.com/jaehyunp/stanfordacm/pull/1/ (and https://github.com/jaehyunp/stanfordacm/pull/3, for that sake).
There was also some fix https://github.com/jaehyunp/stanfordacm/commit/71f999b7753f45ae5d00e89ce77e202f297f0d64 that later got reverted, I don't know what's up with that.
(I wouldn't be surprised if the simplex still got stuck in loops because of broken loop detection and precision stuff, but so far I haven't been hit by it when using Stanford's simplex.)