getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

NaN derivative of sqrt(0) in KeOps v2.0 #221

Open jeanfeydy opened 2 years ago

jeanfeydy commented 2 years ago

Hi @joanglaunes, @bcharlier,

I am currently debugging my optimal transport code with KeOps v2, and have just noticed a breaking change with the new math engine: the value of the derivative of sqrt at 0 is now equal to NaN, whereas it was explicitly set to 0 in KeOps v1.5.

When looking at the diff between both versions, this is due to a simplification of the Rsqrt (inverse square root) operator.

KeOps v1.5 was taking care of the 0 value - link:

  template < typename TYPE >
  static DEVICE INLINE void Operation(TYPE *out, TYPE *outF) {
    #pragma unroll
    for (int k = 0; k < F::DIM; k++)
      if (outF[k] == 0.0f)
        out[k] = 0.0f;
      else
        out[k] = keops_rsqrt(outF[k]);
  }

#if USE_HALF && GPU_ON
  static DEVICE INLINE void Operation(half2 *out, half2 *outF) {
    #pragma unroll
    for (int k = 0; k < F::DIM; k++) {
      half2 cond = __heq2(outF[k],__float2half2_rn(0.0f));
      out[k] = h2rsqrt(outF[k]+cond) * (__float2half2_rn(1.0f)-cond);
    }
  }
#endif

KeOps v2.0b doesn't - link:

keops_rsqrt = math_function(
    cpu_code=lambda x: f"(1.0f/sqrt({x}))", gpu_code="rsqrt", gpu_half2_code="h2rsqrt"
)

Unfortunately, this change from KeOps v1 to KeOps v2 is breaking all the codes that rely on the derivative of a Euclidean norm. This includes kernel or LDDMM with an exponential kernel, as well as optimal transport codes with a Wasserstein-1 metric.

As a consequence, I think that we should come back to the previous behavior.

For the sake of compatibility, we may either:

In your opinion, what should we do?

joanglaunes commented 2 years ago

Hello @jeanfeydy , I have done the fix now in master, can you try it ?

jeanfeydy commented 2 years ago

Hi @joanglaunes,

Thank you so much, you're a champion! I have just corrected the condition from x<0 to x<=0 for the CPU and "standard" GPU code. I'm not too sure about the float16 code: what do you think?

joanglaunes commented 2 years ago

Ok actually I was not careful enough, it should be x==0 instead of x<=0 or x<0, don't you think ? About the float16 code, I have done the change but did not test it. Actually could you put a test script in the pykeops/sandbox folder, or better in the pykeops/test/more_tests_gpu folder ? (these are tested by the CI)