---
# (APPENDIX) Appendix {-}
# Linear algebra and math
<!--
# TODO Finish introductory part on linear algebra in appendix
# I think we need an introduction on inner products, linearty, vector na matrix norms,
# trigonometry, identities, and so on..
# labels: todo, help wanted, good first issue
-->
<!-- ## Complex numbers -->
<!-- - $z\overline{z} = a^2 + b^2= |z|$ -->
<!-- - $\frac{1}{z} = \frac{\overline{z}}{\overline{z}z} $ -->
<!-- - Conjugation (reflection over "horizontal axis"): $z \mapsto \overline{z} = (a-ib) = e^{ix}$ -->
<!-- \begin{itemize} -->
<!-- \item Linarity: -->
<!-- $$f(a+b) = f(a) + f(b)$$ -->
<!-- $$f(\alpha a) = \alpha f(a) $$ -->
<!-- maybe add this: \url{http://www-math.mit.edu/~djk/calculus_beginners/chapter03/section03.html} -->
<!-- \item Cauchy-Schwarz -->
<!-- $$|(x,y)| \leq \|x\| \|y\| = (x,y)^2 \leq (x,x)(y,y) $$ -->
<!-- \begin{proof} -->
<!-- Substitute $(x,y)=\|x\| \|y\| cos(\theta)$: -->
<!-- $$| \|x\|\|y\| cos(\theta) | \leq \|x\|\|y\|$$ -->
<!-- $$|cos(\theta)| \leq 1$$ -->
<!-- \end{proof} -->
<!-- \end{itemize} -->
<!-- \paragraph{Inner products} -->
<!-- \begin{itemize} -->
<!-- \item $(a,b+c) = (a,b)+(a,c) $ -->
<!-- \item $(a,\alpha b) = \alpha (a,b) $ -->
<!-- \end{itemize} -->
<!-- \paragraph{Norms} -->
<!-- \begin{itemize} -->
<!-- \item $ \|x + y\| \leq \|x\| + \|y\|$ (triangle inequality) -->
<!-- \end{itemize} -->
<!-- \paragraph{Matrix norms} -->
<!-- \begin{itemize} -->
<!-- \item All properties of vector norms .. -->
<!-- \item Submultiplicativity -->
<!-- \item -->
<!-- \end{itemize} -->
```{remark}
It is simple to see - using Cauchy-Schwarz - that for a vector $x$ we have that:
$$\|x\|_1 \leq \sqrt{n} \|x\|_2 $$
Distances
$d(x,y) \geq 0$
$d(x,y) = 0$ iif $x=y$
$d(x,y)=d(y,x)$
$d(x,z) \leq d(x,y) + d(y+z)$
Properties of the trace operator
$Tr[A+B] = Tr[A] = Tr[B]$
$Tr[A\otimes B] = Tr[A]Tr[B]$
$Tr_1[A\otimes B] = Tr[A]B$
$Tr[\ket{a}\bra{b}] = \braket{a|b}$
$Tr[AB] = \langle A, B \rangle$
where inner product between matrices is basically defined pointwise as $\sum{ij}a{ij}b_{ij}$
Can you show that the last identity is true?
Properties of tensor product
$\alpha v \otimes w = v \otimes \alpha w = \alpha(v \otimes w)$
$( v_1 + v_2) \otimes w = (v_1 \otimes w) + (v_2 \otimes w)$ (and the symmetric of it)
Trigonometry
Always have in mind that
$$e^{x} = \sum_{k=0}^\infty \frac{x^k}{k!}$$
$$e^{\pm ix}= cos (x) \pm i sin(x) $$
Nunzio's version of Euler's identity
$$e^{i\tau} = 1 $$
where $\tau = 2\pi$.
Because of this, we can rewrite $\sin$ and $\cos$ as:
$$\cos(x) = \frac{e^{ix} + e^{-ix}}{2}$$
$$\sin(x)= \frac{e^{ix} - e^{-ix}}{2i}$$
How can we express the variance as expectation of quantum states? What quantum algorithm might we run to estimate the variance of a random variable $M$?
$$\braket{\psi|M^2|\psi} - (\braket{\psi|M|\psi})^2 $$
Discuss.
Bias-variance tradeoff
Here is a nice reference to understand the bias-variance tradeoff
A probability density function or probability mass function $p(v|\nu)$ for $v = (v_1, \cdots, v_m) \in \mathcal{V}^m$, where $\mathcal{V} \subseteq \mathbb{R}$, is a $\sigma$-algebra over a set $X$ $\nu \in \mathbb{R}^p$ is said to be in the exponential family if it can be written as:
$$p(v|\nu) := h(v)\exp \{ o(\nu)^TT(v) - A(\nu) \}$$
where:
- $\nu \in \mathbb{R}^p$ is called the \emph{canonical or natural} parameter of the family,
- $o(\nu)$ is a function of $\nu$ (which often is just the identity function),
- $T(v)$ is the vector of sufficient statistics: a function that holds all the information the data $v$ holds with respect to the unknown parameters,
- $A(\nu)$ is the cumulant generating function, or log-partition function, which acts as a normalization factor,
- $h(v) > 0$ is the \emph{base measure} which is a non-informative prior and de-facto is scaling constant.
Boosting probabilities with median lemma
Distributions
Concentration inequalities
Markov inequality
The Markov inequality is an upper bound for the probability that a non-negative function of a random variable, that is greater than or equal to a positive constant. Especially in analysis, refer to it as Chebyshev's inequality (sometimes, calling it the first Chebyshev inequality, while referring to the "normal" Chebyshev's inequality as the second Chebyshev inequality) or Bienaym{\'e}'s inequality. \cite{chebyshevwikipedia}
For all non-negative random variable, and $a > 0$, we have that
$$Pr(X \geq a) \leq \frac{E[X]}{a} $$
$$Pr(X \geq aE[X]) \leq \frac{1}{a} $$
Observe that :
$$E[X] = P(X < a) \cdot E[X|X<a] + P(X > a) \cdot E[X|X>a]$$
As both of these expected values are bigger than zero, (using the nonnegativity hypothesis) we have that
$$E[X] \geq P(X > a) \dot E[X|X>a] $$
Now is easy to observe that $E[X|X>a]$ is at least $a$, and by rearranging we obtain that:
$$ \frac{E[X]}{a} \geq P(X > a) $$
The second statement of the theorem follows from substitution, i.e. setting $b=aE[X]$ and using the previous statement.
Let $f$ be a monotone increasing (or noll) function on a space $I$, and define the random variable on $Y$.
$$ P(Y \geq b) \leq \frac{E[f(Y)]}{f(b)} $$
Chebyshev inequality
(Sometimes called Chebyshev's Second Inequality).
Two formulations:
- $Pr[|X - E[X]| \geq k\sigma] \leq \frac{1}{k^2}$
Replacing $k\sigma$ with $\epsilon$, where $k = \epsilon /\sigma$, we have the second formulation:
- $Pr[|X - E[X]| \geq \epsilon]\leq \frac{\sigma^2}{\epsilon^2}$
Hoeffding inequality
Let $X_1,\ldots,X_k$ be independent random variables bounded by the interval $[a, b]$. Define the empirical mean of these variables by $\overline{X} = \frac{1}{k} (X_1+\cdots+X_k)$, then
$$
Pr(|\overline{X} - \mathbb{E}[X]|\leq \epsilon) \geq 1-2
\exp\left(-\frac{2k\epsilon^2}{b-a} \right).
$$
Consequently, if $k\geq (b-a)\epsilon^{-2}\log(2/\eta)$, then
$\overline{X}$ provides an $\epsilon$-approximation of $\mathbb{E}[X]$
with probability at least $1-\eta$.
Polynomial approximations of useful functions
Let $\beta\in(0,1]$, $\eta\in(0,\frac{1}{2}]$ and $t\geq 1$. There exists a polynomial $\tilde{S}$ such that
$\forall x\in [\beta,1]$, $|\tilde{S}(x)-\frac{\ln(1/x)}{2\ln(2/\beta)}|\leq\eta$, and $\,\forall x\in[-1,1]\colon -1\leq\tilde{S}(x)=\tilde{S}(-x)\leq 1$. Moreover $\text{deg}(\tilde{S})=O({\frac{1}{\beta}\log (\frac{1}{\eta} )})$.
Error propagation and approximation
This part is based on many different sources, like [@hogan2006combine], [@ku1966notes].
$$|A - \overline{A} | = \epsilon_{Abs}$$
$$\frac{| A - \overline{A}| }{A} = \epsilon_R \text{ or equivalently}$$
$$ A(1-\epsilon_{R}) \leq \overline{A} \leq A(1+\epsilon_{R})$$
Thus observe that:
If (and only if) $|A| < 1$, then, $\epsilon{Abs} \leq \epsilon{R}$
If (and only if) $|A| > 1$, then, $\epsilon{Abs} \geq \epsilon{R}$
We will study the relation between the two errors, often leveraging the idea of seeing what happen when we set $\epsilon_{Abs} = \epsilon_R A$.
Let's imagine an algorithm $\mathcal{A}$ that estimates a quantity $A$ in time $O(\text{poly}(\epsilon^{-1}))$. We would like to move from a relative to absolute precision, or vice versa.
From absolute to relative precision
Suppose that we have an algorithm that in time $O(f(\frac{1}{\epsilon{Abs}}))$ gives us $|A-\overline{A} | \leq \epsilon{Abs}$ for $\epsilon_{Abs} \in (0, 1]$ and we want a relative error $\epsilon_R > 0$:
If $|A| < 1$, then we need to set $\epsilon_{Abs} = \epsilonR A$. For this, we need to have a lower bound $\lambda^{-1}$ on $A$. If we have it, we can just set $\epsilon{Abs} = \epsilonR \lambda^{-1}$ and run our algorithm in time $O(f(\frac{\lambda}{\epsilon{Abs}}))$
If $|A| > 1$, If we want a relative error $\epsilonR$, then by running the algorithm with $\epsilon{Abs}=\epsilonR$ we have already a relative error bound, as $\frac{|A- \overline{A}|}{A} \leq |A- \overline{A}| \leq \epsilon{Abs}$Note that we $\epsilon{Abs}$ is meant to stay $\in (0,1]$ as it wouldn't make sense to have a runtime of $O(\frac{1}{\epsilon{abs}})$ for $\epsilon_{abs} > 1$.
From relative to absolute precision
If we have an algorithm that in time $O(f(\frac{1}{\epsilon{R}}))$ gives us $|A-\overline{A} | \leq A\epsilon{R}$ and we want an absolute $\epsilon_{Abs}$:
Add explaination on how to do the polynomial approximation of 1/x
It mightbe cool to have the polynomial approximatoin used for ridge regression. Can we do it?
https://github.com/Scinawa/quantumalgorithms.org/blob/253173f8eaa1e47feee74d48f8adb358aca0f6ea/appendix.Rmd#L355
Distances
Properties of the trace operator
where inner product between matrices is basically defined pointwise as $\sum{ij}a{ij}b_{ij}$
Properties of tensor product
Trigonometry
Always have in mind that
$$e^{x} = \sum_{k=0}^\infty \frac{x^k}{k!}$$ $$e^{\pm ix}= cos (x) \pm i sin(x) $$ Nunzio's version of Euler's identity $$e^{i\tau} = 1 $$ where $\tau = 2\pi$.
Because of this, we can rewrite $\sin$ and $\cos$ as: $$\cos(x) = \frac{e^{ix} + e^{-ix}}{2}$$ $$\sin(x)= \frac{e^{ix} - e^{-ix}}{2i}$$
Useful equalities
$$sin(a)cos(b) = \frac{1}{2}(sin(a+b)+sin(a-b) = 1/2(sin(a+b)-sin(b-a) )$$
Series
-->Probability
Bias-variance tradeoff
Here is a nice reference to understand the bias-variance tradeoff
Boosting probabilities with median lemma
Distributions
Concentration inequalities
Markov inequality
The Markov inequality is an upper bound for the probability that a non-negative function of a random variable, that is greater than or equal to a positive constant. Especially in analysis, refer to it as Chebyshev's inequality (sometimes, calling it the first Chebyshev inequality, while referring to the "normal" Chebyshev's inequality as the second Chebyshev inequality) or Bienaym{\'e}'s inequality. \cite{chebyshevwikipedia}
Chebyshev inequality
(Sometimes called Chebyshev's Second Inequality).
Hoeffding inequality
Polynomial approximations of useful functions
Error propagation and approximation
This part is based on many different sources, like [@hogan2006combine], [@ku1966notes].
Thus observe that:
We will study the relation between the two errors, often leveraging the idea of seeing what happen when we set $\epsilon_{Abs} = \epsilon_R A$.
Let's imagine an algorithm $\mathcal{A}$ that estimates a quantity $A$ in time $O(\text{poly}(\epsilon^{-1}))$. We would like to move from a relative to absolute precision, or vice versa.
From absolute to relative precision
Suppose that we have an algorithm that in time $O(f(\frac{1}{\epsilon{Abs}}))$ gives us $|A-\overline{A} | \leq \epsilon{Abs}$ for $\epsilon_{Abs} \in (0, 1]$ and we want a relative error $\epsilon_R > 0$:
If $|A| < 1$, then we need to set $\epsilon_{Abs} = \epsilonR A$. For this, we need to have a lower bound $\lambda^{-1}$ on $A$. If we have it, we can just set $\epsilon{Abs} = \epsilonR \lambda^{-1}$ and run our algorithm in time $O(f(\frac{\lambda}{\epsilon{Abs}}))$
If $|A| > 1$, If we want a relative error $\epsilonR$, then by running the algorithm with $\epsilon{Abs}=\epsilonR$ we have already a relative error bound, as $\frac{|A- \overline{A}|}{A} \leq |A- \overline{A}| \leq \epsilon{Abs}$Note that we $\epsilon{Abs}$ is meant to stay $\in (0,1]$ as it wouldn't make sense to have a runtime of $O(\frac{1}{\epsilon{abs}})$ for $\epsilon_{abs} > 1$.
From relative to absolute precision
If we have an algorithm that in time $O(f(\frac{1}{\epsilon{R}}))$ gives us $|A-\overline{A} | \leq A\epsilon{R}$ and we want an absolute $\epsilon_{Abs}$:
Propagation of error in functions of one variable
License and citation
The website quantumalgorithms.org by Alessandro Luongo is licensed under CC BY-NC-SA 4.0
To cite this work:
References
No newline at end of file ew file mode 100644 ndex 0000000..7262151 ++ b/between.Rmd