Open chrisyeh96 opened 8 months ago
This is a great question, and I apologize for the confusion. I agree with you that I should probably have expanded footnote 26 in chapter 2 to include an explicit definition of Hölder continuity rather than the hand-wavy description I gave. I think I was trying to avoid as much theory in chapter 2 as possible.
Regarding checking the Hölder continuity of a covariance function K(x, x'), the short answer is that you can simply check the Hölder continuity of the functions K(x, ·) (and x can be any arbitrary point in the domain). Note that:
However, this answer isn't terribly insightful.
A nicer way to connect the smoothness of the covariance function to the smoothness of sample paths (which is what we're after here after all) is to assume Hölder continuity of f and work backwards from there. I sketch this reasoning below without actually proving anything, but hopefully it's enough to build intuition.
Let f ~ GP(μ, K), and here let us assume that the GP is centered so that μ ≡ 0. (We can deal with nonzero μ later [*].) Let x and x' be points in the domain of the GP, and let ϕ = f(x) and ϕ' = f(x'). Hölder continuity of f at x (we can use your quoted definition here) is a condition on |ϕ - ϕ'|, but this is actually a random variable.
Instead, we can work with a suitable statistical replacement. It turns out the right notion here is the quantity
√E|ϕ - ϕ'|². (1)
(Note that this equals |ϕ - ϕ'| in the deterministic case, as expected.) We want to control the rate this term vanishes as |x - x'| → 0. If you can accept this definition, we can now define a probabilistic version of the Hölder condition:
√E|ϕ - ϕ'|² ≤ C|x - x'|ᵃ.
equivalent to (squaring both sides)
E|ϕ - ϕ'|² ≤ C|x - x'|²ᵃ.
Note that we can rewrite the lhs in terms of the covariance function:
E|ϕ - ϕ'|² = E[ϕ² - 2ϕϕ' + ϕ'²]
= K(x, x) + K(x', x') - 2K(x, x').
(Here we relied on the GP being centered.) So what we really want for f to be α-Hölder continuous is for K(x, ·) to be 2α-Hölder continuous. Note that this still comes down to checking that K(x, x) - K(x, x') is well behaved as |x - x'| → 0, like before.
Note also that everything above is very similar in spirit to the continuity in mean square discussion in the book, just with a bit more care put into the continuity check.
As an aside, the term (1) actually defines a pseudometric over the domain and appears often in the GP literature as the so-called canonical metric:
d(x, x') = √E|ϕ - ϕ'|².
Hope this helps!
[*] To deal with nonzero μ just assume it's Hölder continuous and add it back in.
Did this help @chrisyeh96?
The theorem on page 30 about sample continuity of a Gaussian process uses Holder continuity, but Holder continuity is never defined in the book. It would be very helpful to see a definition, especially since the book states in the subsequent paragraph that
without providing additional justification.
Furthermore, Hölder continuity is usually defined for functions that take a single input. According to Wikipedia, a function $f: \mathcal{X} \to \mathbb{R}$ is $\alpha$-Hölder continuous on the metric space $(\mathcal{X},d)$ if there exists a constant $0 < C < \infty$ such that
$$\forall x,y \in \mathcal{X}:\quad |f(x)-f(y)| \leq C \, d(x,y)^\alpha.$$
However, a covariance/kernel function $k(x,y)$ takes two inputs. How is Hölder continuity defined for covariance functions, especially non-stationary covariance functions?