Idea of how to deal with transcendental functions over rationals

Open up any ML paper and you will see functions like exp and log used in proofs. In addition to that, I've yet to see a paper that would hesitate to use reals. And the probability distributions - let me take the Gaussian as example uses π, exp, sqrt and has a range that goes from -infinity to infinity. In fact actually integrating it requires doing it over that range. To me this is highly dubious from a computational perspective and I've found the proofs of that too creative for my taste.

By now I have some experience in formal theorem proving. I've gone through most of Software Foundations in Coq and the first volume for Agda, and have played around with both Lean and Agda. But it hasn't really scratched my itch - because even with all my knowledge it feels like the actual mathematics that I need to do machine learning are nowhere to be found.

I've had a certain bit of insight about this. A few months ago, I've speculated that computational reversibility is one of the fundamental principles of learning and while looking for counterpoints I've realized the very first example - the one with the square root, was very unfortunately (or fortunately depending on you looked at it,) picked. Having gone through fairly mainstream schooling, I had not at all realized at the time I wrote that draft that square root is in fact not defined for many rationals.

This is significant because it means that it is only possible to approximate the square root, and that in turns is significant because optimization dynamics appear in mathematics much earlier than is appreciated. It is not the optimization of neural nets, or some other ML algorithm where difficulties first appear, but rather with these simple transcendental functions.

Even if I were to use reals and non-constructive mathematics, it is seems obvious to me that this is only a hack. Even if you use reals, you are just going to run into the same issues again on a much larger scale when you try to reason about ML.

So I've been thinking about a new approach that would be closer to how more realistic reasoning about functions that can only improve, but possibly never converge would look like.

open import Data.Product
open import Data.Nat using (ℕ;suc;zero)
open import Data.Rational
open import Level using (Level)
open import Relation.Nullary
open import Data.Bool using (Bool;false;true)

∣_∣ : ℚ → ℚ
∣ x ∣ with x <? 0ℚ
∣ x ∣ | no ¬p = x
∣ x ∣ | yes p = - x

-- Some function f converges to g, as its precision p increases.
_~>_ : ∀ (f : ℕ → ℚ) → (g : ℕ → ℚ) → Set
f ~> g = ∀ (p : ℕ) → ∃[ p' ] (∣ f p - g p' ∣ ≥ ∣ f (suc p) - g p' ∣)

While it might not be possible to implement the idealized sqrt directly as in math textbooks, you could give it an extra argument p that acts as fuel for how detailed the approximation should be. The idea of an encoding quality is ubiquitous in computing, and it might be a good idea to start with it on the smallest of scales.

So while with rationals sqrt(x) * sqrt(x) might never be exactly equal to x, it might possible to prove that sqrt(x) * sqrt(x) converges to x using the above definition of ~>. I haven't tried it yet though, but I feel it should be doable. And if that turns out to be the case, it might open the way to reasoning about transcendentals in a constructive fashion. To me most of the interesting problems in computer science are optimization problems, and this way of thinking where precision is made explicit really chimes with me.

I am going to try to make something of this, but before I try that let me ask - has what I've illustrated here been done before?

I mean, what I am proposing here is hardly complicated - this sort of relation is the first thing somebody would think of so I am guessing it occurred to somebody else before. But my math sophistication is not that high so I've had to have overlooked it somewhere. Or maybe there is no merit to the idea. Any reference to this would be appreciated.

There's lots of literature on constructive definitions of the reals (see https://en.wikipedia.org/wiki/Construction_of_the_real_numbers). Although I'm not an expert, it looks like you're working your way towards the Bishop reals/Cauchy sequences. You might find this repository interesting that formalises the Bishop reals in Agda. There's also a pdf in there explaining the approach. I've also seen at least two other attempts out there based on the Dedekind reals, but I'm afraid I can't seem to dig them up again.

As for the standard library, the difficulty is not so much in formalising a particular representation but instead in finding one that is easy to work with. The library is still trying to flesh out it's rational infrastructure in this regard. (There has been some significant progress since the last commits to the repository above). Hopefully we'll be able to move onto reals at some point soon :smile:

Although I'm not an expert, it looks like you're working your way towards the Bishop reals/Cauchy sequences.

Aren't those things basically useless from a computational perspective, so much that various textbooks actively try to avoid them in favor of the axiomatic approach? Despite reals being such an important subject, Wildberger's explanations were the only ones I've actually understood and he was unflattering towards them.

What I am trying to do here isn't trying to implement reals in Agda, but prove convergence for various kinds of troublesome functions. This I suspect might be doable for some things - I think it might be possible to prove that sqrt(2) * sqrt(2) ~> 2, but it won't be possible to ever calculate that some function converges to π for example. Furthermore, I do not think this relation is symmetric. I am fairly sure that it won't be possible to ever prove that 2 ~> sqrt(2) * sqrt(2) even if the opposite turns out to be.

...Well, I am just fishing here. If nobody recognizes this, I guess I will just gather my courage and see where this leads me.

I wrote that draft that square root is in fact not defined for many rationals.

A square root of a rational is an algebraic number. The arithmetic of rational numbers Q extended with sqRoot(q) is precisely expressed by operations with linear polynomials in x with coefficients from Q -- with taking residues by (x^2 - q). Here a variable x expresses sqRoot(q). This means: do arithmetic with these polynomials, only each time replace x^2 with q. And all this is programmed in ML, in Agda, and in any reasonable language. Operating with algebraic numbers do not need numbers other than rational, or algebraic numbers of a lower level.

Operating with transcendental numbers is different. Consider the "algorithm"
if a - b == 0 then do F else do G, (1) where a and b are some involved expressions that combine log(π), e and such values. How does it solve which branch to choose? For algebraic numbers such questions are solved by a common algorithm. For real numbers, one has to run some prover that either would find a proof or (most often) would loop infinitely for an unlucky case. So, most of the above "algorithms" include many steps like (1) of searching a proof, when this search is not restricted by any bound. And these considerations do not depend on which programming tool is used, ML, Haskell, Lisp, etc. So that this activity is just designing a prover for transcendental functions. This prover will solve more or less examples, but there will remain many unlucky examples.

Here what I've managed to do so far.

open import Data.Product
open import Data.Empty
open import Data.Nat using (ℕ;suc;zero) renaming (_+_ to _+ⁿ_;_*_ to _*ⁿ_)
open import Data.Integer using (+_;sign;_◃_) renaming (_≤_ to _≤ⁱ_;_*_ to _*ⁱ_;∣_∣ to ∣_∣ⁱ)
open import Data.Rational
open import Data.Rational.Properties using (≤-antisym;≤-irrelevant;≤-trans;≤-refl)
open import Level using (Level)
open import Relation.Nullary
open import Data.Bool using (Bool;false;true)
open import Relation.Binary.PropositionalEquality
open import Data.Sign using () renaming (_*_ to _*ˢ_)
open ℚ

∣_∣ : ℚ → ℚ
∣ x ∣ with x <? 0ℚ
∣ x ∣ | no ¬p = x
∣ x ∣ | yes p = - x

Range = ∃[ lower ] ∃[ higher ] (lower ≤ higher)
_∈_ : ℚ → Range → Set
x ∈ (lower , higher , _) = lower ≤ x × x ≤ higher
_⊆_ : Range → Range → Set
(lower' , higher' , _) ⊆ (lower , higher , _) = lower ≤ lower' × higher' ≤ higher
_⊇_ : Range → Range → Set
a ⊇ b = b ⊆ a

⊇-antisym : ∀ {a b} → a ⊇ b → b ⊇ a → a ≡ b
⊇-antisym {a⁻ , a⁺ , a⁻≤a⁺} {b⁻ , b⁺ , b⁻≤b⁺} (a⁻≤b⁻ , b⁺≤a⁺) (b⁻≤a⁻ , a⁺≤b⁺) with ≤-antisym a⁻≤b⁻ b⁻≤a⁻ | ≤-antisym b⁺≤a⁺ a⁺≤b⁺
⊇-antisym {a⁻ , a⁺ , a⁻≤a⁺} {.a⁻ , .a⁺ , b⁻≤b⁺} _ _ | refl | refl with ≤-irrelevant a⁻≤a⁺ b⁻≤b⁺
⊇-antisym {a⁻ , a⁺ , a⁻≤a⁺} {.a⁻ , .a⁺ , .a⁻≤a⁺} _ _ | refl | refl | refl = refl

-- Some function f converges to g, as its precision p increases.
_~>_ : ∀ (f : ℕ → Range) → (g : ℕ → Range) → Set
f ~> g = ∀ (p : ℕ) → ∃[ p' ] (f p ⊇ g p')

~>-refl : ∀ {x} → x ~> x
~>-refl {x} p = p , ≤-refl , ≤-refl

~>-trans : ∀ {a b c} → a ~> b → b ~> c → a ~> c
~>-trans {a} {b} {c} f g p with f p
~>-trans {a} {b} {c} f g p | p' , f⁻ , f⁺ with g p'
~>-trans {a} {b} {c} f g p | p' , f⁻ , f⁺ | p'' , g⁻ , g⁺ = p'' , ≤-trans f⁻ g⁻ , ≤-trans g⁺ f⁺

const : ℚ → ℕ → Range
const x p = x , x , ≤-refl

postulate sqr-≤0 : ∀ t → 0ℚ ≤ t * t
postulate sqr-≤t : ∀ {t} → 1ℚ ≤ t → t ≤ t * t

postulate 1≤l≤m→ll≤mm : ∀ {l m} → 1ℚ ≤ l → l ≤ m → l * l ≤ m * m
postulate l≤m≤h : ∀ {l h} → l ≤ h → l ≤ (l + h) * ½ × (l + h) * ½ ≤ h
postulate ¬a≤b→b≤a : ∀ {a b} → ¬ (a ≤ b) → b ≤ a

sqrt' : (t : ℚ) → 1ℚ ≤ t → ℕ → ∃[ l ] ∃[ h ] (1ℚ ≤ l × l ≤ h × l * l ≤ t × t ≤ h * h)
sqrt' t 1≤t zero = 1ℚ , t , ≤-refl , 1≤t , 1≤t , sqr-≤t 1≤t
sqrt' t 1≤t (suc p) with sqrt' t 1≤t p
sqrt' t _ (suc _) | l , h , 1≤l , l≤h , ll≤t , t≤hh with (l + h) * (+ 1 / 2) | l≤m≤h l≤h
... | m | l≤m , m≤h with m * m ≤? t
sqrt' t _ (suc _) | l , h , 1≤l , l≤h , ll≤t , t≤hh | m | l≤m , m≤h | .true because ofʸ mm≤t = m , h , ≤-trans 1≤l l≤m , m≤h , mm≤t , t≤hh
sqrt' t _ (suc _) | l , h , 1≤l , l≤h , ll≤t , t≤hh | m | l≤m , m≤h | .false because ofⁿ ¬mm≤t = l , m , 1≤l , l≤m , ll≤t , ¬a≤b→b≤a ¬mm≤t

sqrt : (t : ℚ) → 1ℚ ≤ t → ℕ → Range
sqrt t 1≤t p with sqrt' t 1≤t p
sqrt t 1≤t p | l , h , 1≤l , l≤h , _ = l , h , l≤h

sqr-≤1 : ∀ {t} → 1ℚ ≤ t → 1ℚ ≤ t * t
sqr-≤1 1≤t = ≤-trans 1≤t (sqr-≤t 1≤t)

postulate sqrt-squeeze : ∀ {l t h} → 1ℚ ≤ l → l ≤ h → l * l ≤ t * t → t * t ≤ h * h → l ≤ t × t ≤ h

sqrt-sqr~>id : (t : ℚ) → (1≤t : 1ℚ ≤ t) → sqrt (t * t) (sqr-≤1 1≤t) ~> const t
sqrt-sqr~>id t 1≤t p with sqrt' (t * t) (sqr-≤1 1≤t) p
sqrt-sqr~>id t 1≤t p | l , h , 1≤l , l≤h , ll≤tt , tt≤hh with sqrt-squeeze 1≤l l≤h ll≤tt tt≤hh
sqrt-sqr~>id t 1≤t p | l , h , 1≤l , l≤h , ll≤tt , tt≤hh | l≤t , t≤h = 0 , l≤t , t≤h 

_*ʳ_ : Range → Range → Range
(al , ah , al≤h) *ʳ (bl , bh , bl≤h) with al * bl ≤? ah * bh
(al , ah , al≤h) *ʳ (bl , bh , bl≤h) | .true because ofʸ p = al * bl , ah * bh , p
(al , ah , al≤h) *ʳ (bl , bh , bl≤h) | .false because ofⁿ ¬p = ah * bh , al * bl , ¬a≤b→b≤a ¬p

sqr-sqrt~>id : (t : ℚ) → (1≤t : 1ℚ ≤ t) → (λ p → sqrt t 1≤t p *ʳ sqrt t 1≤t p) ~> const t
sqr-sqrt~>id t 1≤t p with sqrt' t 1≤t p
... | l , h , 1≤l , l≤h , ll≤t , t≤hh with l * l ≤? h * h
sqr-sqrt~>id t 1≤t p | l , h , 1≤l , l≤h , ll≤t , t≤hh | .true because ofʸ ll≤hh = 0 , ll≤t , t≤hh
sqr-sqrt~>id t 1≤t p | l , h , 1≤l , l≤h , ll≤t , t≤hh | .false because ofⁿ ¬ll≤hh = ⊥-elim (¬ll≤hh (≤-trans ll≤t t≤hh))

I got the definition wrong in my original post, the one I have now is much better. The idea is for the transcendental function to output a range rather than a fixed value. With the original definition you could prove that 2 ~> 3 and similar nonsense.

What I am trying to prove right now is...

sqrt(x * x) ~> sqrt(x) * sqrt(x) sqrt(x) * sqrt(x) ~> sqrt(x * x)

But before I can even get to those, I am need to figure out how to clear a few seemingly easier prerequisites.

open import Data.Nat using (ℕ;suc;zero)
open import Data.Rational
open import Data.Product
open import Relation.Nullary
open import Data.Bool using (Bool;false;true)

halve : ℕ → ℚ
halve zero = 1ℚ
halve (suc p) = ½ * halve p

∃-halve : ∀ {a b} → 0ℚ < a → a < b → ∃[ p ] (halve p * b < a)
∃-halve {a} {b} 0<a a<b = h 1 where
  h : ℕ → ∃[ p ] (halve p * b < a)
  h p with halve p * b <? a
  h p | .true because ofʸ b'<a = p , b'<a
  h p | .false because ofⁿ ¬b'<a = h (suc p)

As an intermediate step, I need to prove ∃-halve to the termination checker. I made a SO question asking about this. I have no idea how to do this right now.

It seems that I am running into some variant of the continuum problem here and there are all sorts of subtle issues with the rationals that I have no idea how to even begin dealing with in Agda.

But whether this turns out to be provable or not, without a doubt this is the kind of math that I've been yearning for. This is what I want to learn.

_⊇?_ : Decidable _⊇_
(a⁻ , a⁺ , a⁻≤a⁺) ⊇? (b⁻ , b⁺ , b⁻≤b⁺) with a⁻ ≤? b⁻ | b⁺ ≤? a⁺
((a⁻ , a⁺ , a⁻≤a⁺) ⊇? (b⁻ , b⁺ , b⁻≤b⁺)) | .true because ofʸ a⁻≤b⁻ | .true because ofʸ b⁺≤a⁺ = true because ofʸ (a⁻≤b⁻ , b⁺≤a⁺)
((a⁻ , a⁺ , a⁻≤a⁺) ⊇? (b⁻ , b⁺ , b⁻≤b⁺)) | .true because ofʸ a⁻≤b⁻ | .false because ofⁿ ¬b⁺≤a⁺ = false because ofⁿ λ { (a⁻≤b⁻ , b⁺≤a⁺) → ⊥-elim (¬b⁺≤a⁺ b⁺≤a⁺)}
((a⁻ , a⁺ , a⁻≤a⁺) ⊇? (b⁻ , b⁺ , b⁻≤b⁺)) | .false because ofⁿ ¬a⁻≤b⁻ | b⁺≤a⁺ = false because ofⁿ λ { (a⁻≤b⁻ , _) → ⊥-elim (¬a⁻≤b⁻ a⁻≤b⁻)}

-- I do not see any way to actually prove the termination of this function.
-- Nevertheless, this is different from the postulates in that if it does not diverge it really will
-- construct a proof of convergence.
{-# TERMINATING #-}
sqr-sqrt~>sqrt-sqr : (t : ℚ) → (1≤t : 1ℚ ≤ t) → (λ p → sqrt t 1≤t p *ʳ sqrt t 1≤t p) ~> sqrt (t * t) (sqr-≤1 1≤t)
sqr-sqrt~>sqrt-sqr t 1≤t p = h 0 where
  h : ℕ → ∃[ p' ] ((sqrt t 1≤t p *ʳ sqrt t 1≤t p) ⊇ (sqrt (t * t) (sqr-≤1 1≤t) p'))
  h p' with (sqrt t 1≤t p *ʳ sqrt t 1≤t p) ⊇? (sqrt (t * t) (sqr-≤1 1≤t) p')
  h p' | .true because ofʸ prf = p' , prf
  h p' | .false because ofⁿ ¬prf = h (suc p')

-- Likewise for the following one. This general method will prove convergence for any kind of function. 
-- That having said, I am decently sure that the following should not diverge. Still, this is dubious as a proof method.
{-# TERMINATING #-}
sqrt-sqr~>sqr-sqrt : (t : ℚ) → (1≤t : 1ℚ ≤ t) → sqrt (t * t) (sqr-≤1 1≤t) ~> (λ p → sqrt t 1≤t p *ʳ sqrt t 1≤t p)
sqrt-sqr~>sqr-sqrt t 1≤t p = h 0 where
  h : ℕ → ∃[ p' ] (sqrt (t * t) (sqr-≤1 1≤t) p ⊇ (sqrt t 1≤t p' *ʳ sqrt t 1≤t p'))
  h p' with (sqrt (t * t) (sqr-≤1 1≤t) p) ⊇? (sqrt t 1≤t p' *ʳ sqrt t 1≤t p')
  h p' | .true because ofʸ prf = p' , prf
  h p' | .false because ofⁿ ¬prf = h (suc p')

Assuming sqrt(x * x) ~> sqrt(x) * sqrt(x) and sqrt(x) * sqrt(x) ~> sqrt(x * x) actually holds, the above will in fact construct proof of that, so it is a different from just postulating the proof. Still, I hope it would be possible to do better. If anybody has advice on how to prove the termination of these functions I'd appreciate it.

I'll close this for the time being as I've moved on to other things. I got an interesting reply to the ∃-halve SO question, but at the present moment the rational library is too sparse and I do not feel like allocating time to doing all the auxiliary lemmas on rationals that would be needed to implement PLL's suggestion. It might be an interesting project to contribute in order to extend the rationals part of the library at some point in the future, but I am not promising anything.

I've been hoping that the answer to ∃-halve question might give me some hints how to prove the termination of the more complex sqrt-sqr~>sqr-sqrt and sqr-sqrt~>sqrt-sqr, but unfortunately I am going to be satisfied with what I have here. I am decently satisfied in fact.

I think that just like for sqrt it should be possible to extend the general methodology for any transcendental function and use the transitive relation to do equational reasoning for the kinds of proofs what would have otherwise necessitated the use of reals and equalities on them. I feel that what I've done here is a decent proof of concept of what could be a much larger project.

Also, to me personally ∃-halve was an eye opener. Though PLL's reply was interesting as I had not even though of inverting the equation, and doing that thing with the naturals - mentally I just went to the solution directly. Despite being persistent about finding the way to make it terminating, to my mind, ∃-halve was already good as a proof even as it is. The same goes with sqrt-sqr~>sqr-sqrt and sqr-sqrt~>sqrt-sqr. The proofs made me realize for the first time that there is a very real difference between a proof and a proof that passes the termination checker.

For the first time, I am starting to think that programs are proofs, and the truth of that statement does not necessarily depend on termination checking or the language having dependent types. When I started going through SF in late April I had a pretty confused notion of what a proof even was and I wanted to learn theorem proving for the sake of math, but now I seem to be making a full circle back to programming. Looking into the future, the struggle between mathematics and programming, and the high (dependently typed) and low (non-DT) styles of programming will no doubt be a wellspring of insight.

agda / agda-stdlib

Idea of how to deal with transcendental functions over rationals #990