RemyDegenne / testing-lower-bounds

Information theory and hypothesis testing, in Lean
https://remydegenne.github.io/testing-lower-bounds/blueprint
Apache License 2.0
8 stars 3 forks source link

Refactors of f-divergences #154

Open RemyDegenne opened 2 weeks ago

RemyDegenne commented 2 weeks ago

Current definition

def fDiv (f : ℝ → ℝ) (μ ν : Measure α) : EReal :=
  if ¬ Integrable (fun x ↦ f ((∂μ/∂ν) x).toReal) ν then ⊤
  else ∫ x, f ((∂μ/∂ν) x).toReal ∂ν + derivAtTop f * μ.singularPart ν .univ

Issues

Constraints

f has to have domain , and not the more natural ℝ≥0∞, because otherwise we can't talk about its derivatives and other basic things break.

Refactor 1

def rightLimZero (f : ℝ → ℝ) : EReal := Function.rightLim (fun x ↦ (f x : EReal)) (0 : ℝ)

def fDiv' (f : ℝ → ℝ) (μ ν : Measure α) : EReal :=
  if ¬ IntegrableOn (fun x ↦ f (μ.rnDeriv ν x).toReal) (ν.singularPartSet μ)ᶜ ν then ⊤
  else ∫ x in (ν.singularPartSet μ)ᶜ, f (μ.rnDeriv ν x).toReal ∂ν
    + derivAtTop f * μ.singularPart ν .univ + rightLimZero f * ν.singularPart μ univ

This does not change fundamentally the definition. It simply splits a third term, corresponding to the set where the Radon-Nikodym derivative is zero, and uses the right limit of f at 0.

This is a rather light refactor from a conceptual point of view. It's more a fix than a refactor. However, it will make all computations about fDiv heavier.

Refactor 2

def fDiv (f : ℝ → ℝ≥0∞) (μ ν : Measure α) : ℝ≥0∞ :=
    ∫⁻ x, f ((∂μ/∂ν) x).toReal ∂ν + derivAtTop f * μ.singularPart ν .univ

(in which derivAtTop f would be changed to be ℝ≥0∞)

This much more radical refactor does the following:

That definition was not used from the start because it seemed to me that allowing f to take negative values was important. After all, every book writes that the KL divergence is an f-divergence for x * log x, which takes negative values. However on probability measures it is always possible to add an affine function without changing the divergence while making the function nonnegative: x * log x + 1 - x gives the same f-divergence on probability measures.

We would have to do one of those two options:

  1. change the definition of KL to be that f-divergence. The KL definition would be something like
    def kl (μ ν : Measure α) : ℝ≥0∞ :=
    if μ ≪ ν ∧ Integrable (llr μ ν) μ then ENNReal.ofReal (∫ x, llr μ ν x ∂μ) + ν .univ - μ .univ  else ∞
  2. write that KL is equal to an f-divergence only on probability measures

We could change the type of all divergences to use ℝ≥0∞ if we go for option 1 for all of them.

An advantage of that second refactor is that it would be presumably better linked to integrals of statistical informations (which are necessarily nonnegative).

RemyDegenne commented 2 weeks ago

There is an issue with f : ℝ → ℝ≥0∞, which is that we can't talk about its right derivative since ℝ≥0∞ is not a real normed space. We have to go to then back to ℝ≥0∞. And everything related to slopes and convexity breaks.