Open cgeoga opened 1 month ago
Here is a further reduced example which has a very similar issue:
double __enzyme_fwddiff(void*, ...);
static void absterm_(const double *src, double *dest) { *dest = fabs(*src) * *src; }
void derivative_absterm_(const double* src, const double *d_src, const double* dest, double* d_dest) {
printf("Inside absterm derivative.\n");
*d_dest = 100;
}
void* __enzyme_register_derivative_absterm[] = {
(void*)absterm_,
(void*)derivative_absterm_,
};
double absterm(double x) {
double y;
absterm_(&x, &y);
return y;
}
double math_function(double x) {
return (absterm(x));
}
int main() {
double number = 2.0;
double test = __enzyme_fwddiff((void*)math_function, number, 1.0);
printf("test output: %f\n", test);
}
Which when compiled with: clang test_customfwd.c -O0 -o customfwd -fplugin=/usr/lib/ClangEnzyme-14.so
Outputs this:
Inside absterm derivative. test output: 100.000000
Though, when compiled with -O3 outputs:
test output: 4.000000
Then, once I compile with -fno-inline: clang test_customfwd.c -O3 -o customfwd -fno-inline -fplugin=/usr/lib/ClangEnzyme-14.so
The problem is completely solved in this example with that flag. Sadly, this solution does not work for cgeoga's original mwe.c, but I hope it can give a step in the right direction.
So, we have an update here and a working fix. I'll leave the question of whether or not to close this up to you.
As a tiny amount of background, the point of this issue is that we have code for evaluating a series $g(x) = \sum_j f_j(x)$ to convergence. But (assuming sufficiently good behavior that this is allowed) $\frac{d}{d x} g(x) = \sum_j \frac{d}{d x} f_j(x)$ converges a bit more slowly. So in a code where you effectively compute terms until abs(new_term) < convergence_epsilon
, we needed something that allowed us to sort of trick the compiler to continue accumulating terms until abs(d_new_term_dx) < convergence_epsilon
as well.
The solution ended up looking like this:
// This part is actually the code we used:
// These attributes are very important---without them, compiling with any -O
// flag besides -O0 silently breaks this rule (or verbosely breaks it, if you
// compile with -Rpass=enzyme).
void __attribute__((noinline,optnone)) isconverged(double* t, double* result) {
*result = fabs(*t) - EPS;
}
void disconverged(double* t, double* d_t,
double* result, double* d_result) {
*result = fmax(fabs(*t) - EPS, fabs(*d_t) - EPS);
}
void* __enzyme_register_derivative_converged[] = {
(void*)isconverged,
(void*)disconverged,
};
// [... stuff ...]
// This part is _pseudocode_:
double my_series_fun(double x, [...]) {
// don't want to copy redundant things here...
double stuff = [...];
double out = 0.0, cvg = 1.0;
for(int k=1; k<50; k++){
newterm = [...];
out += newterm;
isconverged(&newterm, &cvg);
if(cvg < 0.0) break;
}
return out;
}
The attributes optnone
and noinline
were crucial for this to continue working if compiling with O*
flags.
My usual preface: apologies if I have missed relevant docs or existing/past issues on this.
I have some C code that uses a custom forward-mode rule. Compiled with
-O0
, that rule gets triggered. But if I compile with any higher optimization level, the custom rule does not get triggered (which I investigate using a simple print statement). Here is an MWE:Which I compile with
on linux with
clang 18
. When the compiled code actually hits the rule,./mwe
will print a handful ofHi!
lines before giving the solution.I have the
-Rpass=.*
flag to see all the compiler optimizations that get done, and depending on a few small tweaks I sometimes get something likefor
ipabsterm
. But I'm really having trouble figuring out what compiler optimization is breaking things. As you can see with the attribute I've put for that function, I was suspicious that the function was getting inlined and the loop vectorized, which maybe was breaking things. But nothing I have tried has fixed it.Any thoughts or suggestions you have would be greatly appreciated! Thanks so much in advance.