Add explaination on how to do the polynomial approximation of 1/x

It mightbe cool to have the polynomial approximatoin used for ridge regression. Can we do it?

https://github.com/Scinawa/quantumalgorithms.org/blob/253173f8eaa1e47feee74d48f8adb358aca0f6ea/appendix.Rmd#L355

---

# (APPENDIX) Appendix {-} 

# Linear algebra and math

<!--
# TODO Finish introductory part on linear algebra in appendix
# I think we need an introduction on inner products, linearty, vector na matrix norms,
# trigonometry, identities, and so on.. 
# labels: todo, help wanted, good first issue
-->

<!-- ## Complex numbers -->
<!-- - $z\overline{z} = a^2 + b^2= |z|$ -->
<!-- - $\frac{1}{z} = \frac{\overline{z}}{\overline{z}z} $ -->
<!-- - Conjugation (reflection over "horizontal axis"): $z \mapsto \overline{z} = (a-ib) = e^{ix}$ -->

<!-- \begin{itemize} -->
<!--     \item Linarity:  -->
<!--     $$f(a+b) = f(a) + f(b)$$ -->
<!--     $$f(\alpha a) = \alpha f(a) $$ -->
<!-- maybe add this: \url{http://www-math.mit.edu/~djk/calculus_beginners/chapter03/section03.html} -->
<!--     \item Cauchy-Schwarz -->
<!--     $$|(x,y)| \leq \|x\| \|y\| = (x,y)^2 \leq (x,x)(y,y) $$ -->
<!--     \begin{proof} -->
<!--     Substitute $(x,y)=\|x\| \|y\| cos(\theta)$: -->
<!--     $$| \|x\|\|y\| cos(\theta) | \leq \|x\|\|y\|$$ -->
<!--     $$|cos(\theta)| \leq 1$$ -->
<!--     \end{proof} -->

<!-- \end{itemize} -->
<!-- \paragraph{Inner products} -->
<!-- \begin{itemize} -->
<!--     \item $(a,b+c) = (a,b)+(a,c)  $ -->
<!--     \item $(a,\alpha b) = \alpha (a,b)  $ -->
<!-- \end{itemize} -->

<!-- \paragraph{Norms} -->
<!-- \begin{itemize} -->
<!--     \item $ \|x + y\| \leq \|x\| + \|y\|$ (triangle inequality) -->

<!-- \end{itemize} -->

<!-- \paragraph{Matrix norms} -->
<!-- \begin{itemize} -->
<!--     \item All properties of vector norms .. -->
<!-- \item Submultiplicativity -->
<!--     \item  -->
<!-- \end{itemize} -->

```{remark}
It is simple to see - using Cauchy-Schwarz - that for a vector $x$ we have that:
$$\|x\|_1 \leq \sqrt{n} \|x\|_2 $$

Distances

$d(x,y) \geq 0$
$d(x,y) = 0$ iif $x=y$
$d(x,y)=d(y,x)$
$d(x,z) \leq d(x,y) + d(y+z)$

Properties of the trace operator

$Tr[A+B] = Tr[A] = Tr[B]$
$Tr[A\otimes B] = Tr[A]Tr[B]$
$Tr_1[A\otimes B] = Tr[A]B$
$Tr[\ket{a}\bra{b}] = \braket{a|b}$
$Tr[AB] = \langle A, B \rangle$

where inner product between matrices is basically defined pointwise as $\sum{ij}a{ij}b_{ij}$

 Can you show that the last identity is true?

Properties of tensor product

$\alpha v \otimes w = v \otimes \alpha w = \alpha(v \otimes w)$
$( v_1 + v_2) \otimes w = (v_1 \otimes w) + (v_2 \otimes w)$ (and the symmetric of it)

Trigonometry

Always have in mind that

$$e^{x} = \sum_{k=0}^\infty \frac{x^k}{k!}$$ $$e^{\pm ix}= cos (x) \pm i sin(x) $$ Nunzio's version of Euler's identity $$e^{i\tau} = 1 $$ where $\tau = 2\pi$.

Because of this, we can rewrite $\sin$ and $\cos$ as: $$\cos(x) = \frac{e^{ix} + e^{-ix}}{2}$$ $$\sin(x)= \frac{e^{ix} - e^{-ix}}{2i}$$

Useful equalities

$$sin(a)cos(b) = \frac{1}{2}(sin(a+b)+sin(a-b) = 1/2(sin(a+b)-sin(b-a) )$$

$$ e^x = cos(x)+isin(x) $$
$$e^{A+B} = cos(a+b)+isin(a+b) = e^Ae^B = (cos(a)+isin(a))(cos(b)+isin(b))  $$
$$cos(a+b)+isin(a+b) = cos(a)cos(b) + cos(a)+isin(b)+isin(a)cos(b) - sin(b)sin(a) $$
$$cos(a+b)+isin(a+b) = cos(a)cos(b) + cos(a)isin(b)+isin(a)cos(b) - sin(b)sin(a) $$

Series

-->

Probability

\begin{align}
\operatorname{Var}(X) &= \operatorname{E}\left[(X - \operatorname{E}[X])^2\right] \\[4pt]
&= \operatorname{E}\left[X^2 - 2X\operatorname{E}[X] + \operatorname{E}[X]^2\right] \\[4pt]
&= \operatorname{E}\left[X^2\right] - 2\operatorname{E}[X]\operatorname{E}[X] + \operatorname{E}[X]^2 \\[4pt]
&= \operatorname{E}\left[X^2 \right] - \operatorname{E}[X]^2
\end{align}

How can we express the variance as expectation of quantum states? What quantum algorithm might we run to estimate the variance of a random variable $M$?
  $$\braket{\psi|M^2|\psi} - (\braket{\psi|M|\psi})^2 $$ 
  Discuss.

Bias-variance tradeoff

Here is a nice reference to understand the bias-variance tradeoff

A probability density function or probability mass function $p(v|\nu)$ for $v = (v_1, \cdots, v_m) \in \mathcal{V}^m$, where $\mathcal{V} \subseteq \mathbb{R}$, is a $\sigma$-algebra over a set $X$   $\nu \in \mathbb{R}^p$ is said to be in the exponential family if it can be written as:
 $$p(v|\nu) :=  h(v)\exp \{ o(\nu)^TT(v) - A(\nu) \}$$
 where:

- $\nu \in \mathbb{R}^p$ is called the \emph{canonical or natural} parameter of the family,
- $o(\nu)$ is a function of $\nu$  (which often is just the identity function),
- $T(v)$  is the vector of sufficient statistics: a function that holds all the information the data $v$ holds with respect to the unknown parameters,
- $A(\nu)$ is the cumulant generating function, or log-partition function, which acts as a normalization factor,
- $h(v) > 0$ is the \emph{base measure} which is a non-informative prior and de-facto is scaling constant.

Boosting probabilities with median lemma

Distributions

Concentration inequalities

Markov inequality

The Markov inequality is an upper bound for the probability that a non-negative function of a random variable, that is greater than or equal to a positive constant. Especially in analysis, refer to it as Chebyshev's inequality (sometimes, calling it the first Chebyshev inequality, while referring to the "normal" Chebyshev's inequality as the second Chebyshev inequality) or Bienaym{\'e}'s inequality. \cite{chebyshevwikipedia}

For all non-negative random variable, and $a > 0$,  we have that
$$Pr(X  \geq a) \leq \frac{E[X]}{a} $$
$$Pr(X  \geq aE[X]) \leq \frac{1}{a} $$

Observe that :
$$E[X] = P(X < a) \cdot E[X|X<a] +  P(X > a) \cdot E[X|X>a]$$ 
As both of these expected values are bigger than zero, (using the nonnegativity hypothesis) we have that
$$E[X] \geq P(X > a) \dot E[X|X>a] $$

Now is easy to observe that $E[X|X>a]$ is at least $a$, and by rearranging we obtain that:
$$ \frac{E[X]}{a} \geq P(X > a) $$

The second statement of the theorem follows from substitution, i.e. setting $b=aE[X]$ and using the previous statement.

Let $f$ be a monotone increasing (or noll) function on a space $I$, and define the random variable on $Y$. 
$$ P(Y \geq b) \leq \frac{E[f(Y)]}{f(b)} $$

Chebyshev inequality

(Sometimes called Chebyshev's Second Inequality).

Two formulations:
- $Pr[|X - E[X]| \geq k\sigma]  \leq \frac{1}{k^2}$
Replacing $k\sigma$ with $\epsilon$, where $k = \epsilon /\sigma$, we have the second formulation:
- $Pr[|X - E[X]| \geq \epsilon]\leq \frac{\sigma^2}{\epsilon^2}$

Hoeffding inequality

    Let $X_1,\ldots,X_k$ be independent random variables bounded by the interval $[a, b]$. Define the empirical mean of these variables by $\overline{X} = \frac{1}{k} (X_1+\cdots+X_k)$, then
    $$
    Pr(|\overline{X} - \mathbb{E}[X]|\leq \epsilon) \geq 1-2
    \exp\left(-\frac{2k\epsilon^2}{b-a} \right).
    $$
    Consequently, if $k\geq (b-a)\epsilon^{-2}\log(2/\eta)$, then
    $\overline{X}$ provides an $\epsilon$-approximation of $\mathbb{E}[X]$
    with probability at least $1-\eta$.

Polynomial approximations of useful functions

        Let $\beta\in(0,1]$, $\eta\in(0,\frac{1}{2}]$ and $t\geq 1$. There exists a polynomial $\tilde{S}$ such that
        $\forall x\in [\beta,1]$, $|\tilde{S}(x)-\frac{\ln(1/x)}{2\ln(2/\beta)}|\leq\eta$,  and $\,\forall x\in[-1,1]\colon -1\leq\tilde{S}(x)=\tilde{S}(-x)\leq 1$. Moreover $\text{deg}(\tilde{S})=O({\frac{1}{\beta}\log (\frac{1}{\eta} )})$.

Error propagation and approximation

This part is based on many different sources, like [@hogan2006combine], [@ku1966notes].

$$|A - \overline{A} | = \epsilon_{Abs}$$

$$\frac{| A - \overline{A}| }{A}  = \epsilon_R \text{ or equivalently}$$
$$ A(1-\epsilon_{R}) \leq  \overline{A} \leq A(1+\epsilon_{R})$$

Thus observe that:

If (and only if) $|A| < 1$, then, $\epsilon{Abs} \leq \epsilon{R}$
If (and only if) $|A| > 1$, then, $\epsilon{Abs} \geq \epsilon{R}$

We will study the relation between the two errors, often leveraging the idea of seeing what happen when we set $\epsilon_{Abs} = \epsilon_R A$.

Let's imagine an algorithm $\mathcal{A}$ that estimates a quantity $A$ in time $O(\text{poly}(\epsilon^{-1}))$. We would like to move from a relative to absolute precision, or vice versa.

From absolute to relative precision

Suppose that we have an algorithm that in time $O(f(\frac{1}{\epsilon{Abs}}))$ gives us $|A-\overline{A} | \leq \epsilon{Abs}$ for $\epsilon_{Abs} \in (0, 1]$ and we want a relative error $\epsilon_R > 0$:

If $|A| < 1$, then we need to set $\epsilon_{Abs} = \epsilonR A$. For this, we need to have a lower bound $\lambda^{-1}$ on $A$. If we have it, we can just set $\epsilon{Abs} = \epsilonR \lambda^{-1}$ and run our algorithm in time $O(f(\frac{\lambda}{\epsilon{Abs}}))$
If $|A| > 1$, If we want a relative error $\epsilonR$, then by running the algorithm with $\epsilon{Abs}=\epsilonR$ we have already a relative error bound, as $\frac{|A- \overline{A}|}{A} \leq |A- \overline{A}| \leq \epsilon{Abs}$Note that we $\epsilon{Abs}$ is meant to stay $\in (0,1]$ as it wouldn't make sense to have a runtime of $O(\frac{1}{\epsilon{abs}})$ for $\epsilon_{abs} > 1$.

From relative to absolute precision

If we have an algorithm that in time $O(f(\frac{1}{\epsilon{R}}))$ gives us $|A-\overline{A} | \leq A\epsilon{R}$ and we want an absolute $\epsilon_{Abs}$:

Propagation of error in functions of one variable

License and citation

The website quantumalgorithms.org by Alessandro Luongo is licensed under CC BY-NC-SA 4.0

To cite this work:

  @misc{quantumalgorithms,
   title={Quantum algorithms for data analysis},
   author={Luongo, A.},
   url={https://quantumalgorithms.org},
   year={2020}
  }

References

No newline at end of file ew file mode 100644 ndex 0000000..7262151 ++ b/between.Rmd



e494398f471622d707ba4cf12aa58bdb9c279072

Scinawa / quantumalgorithms.org

Add explaination on how to do the polynomial approximation of 1/x #2