Triamus / play

play repo for experiments (mainly with git)
1 stars 0 forks source link

density estimation #9

Open Triamus opened 6 years ago

Triamus commented 6 years ago

The formula to compute the corresponding quantile of a given forecast $x_f$ is

$$CDF_{KDE} \left( xf \right) = 1- \frac{1}{N} \sum{i=1}^{N} \Phi \left( \frac{x_h^ {\left( i \right)} - x_f}{\frac{1}{2} \sigma_h} \right)$$

where $x_h^ {\left( i \right)}$ are the N historical observations with standard deviation $\sigma_h$ and $\Phi$ is the cumulative distribution function of the standard normal distribution.

We can implement this approach as an example for China GDP as follows. $\$

quantile_cn <- 6.1
sd_cn <- sd(gdp$cn)
s_norm_cn <- 
  pnorm(q = (gdp$cn - quantile_cn), mean = 0, sd = sd_cn * 1/2)
sample_kde_cn <- 1 - (1 / 68) * sum(s_norm_cn)
sample_kde_cn

Functional approach

We can create a function for parametric usage as follows. $\$

kde <- function(x, quantile) {
  return(1 - (1 / length(x) * sum(pnorm((x - quantile), sd = sd(x) * 1/2))))
  }

Applying it to all series example quantiles gives: $\$

(sample_kde_de <- kde(x = gdp$de, quantile = 1.3))

# Visualization

We can visualize the KDE against the normal distribution for a range of sample quantiles.
$\\$
```{r}
sample_quantiles <-
  seq(from = -10, to = 20, by = 0.01)

We create a function to generate a vector of probabilities for the sample quantiles (given large enough size, the curve is smooth). Simultaneously, we draw the probabilities from a normal distribution with same mean and standard deviation. $\$

kde_cdf <- function(x, quantiles) {
  out <- vector("list", length(quantiles))

  for (i in seq_along(quantiles)) {
    out[[i]] <- kde(x = x, quantile = quantiles[i])
    }

  unlist(out)
  }

kde_de <- kde_cdf(x = gdp$de, quantiles = sample_quantiles)
kde_uk <- kde_cdf(x = gdp$uk, quantiles = sample_quantiles)
kde_us <- kde_cdf(x = gdp$us, quantiles = sample_quantiles)
kde_jp <- kde_cdf(x = gdp$jp, quantiles = sample_quantiles)
kde_cn <- kde_cdf(x = gdp$cn, quantiles = sample_quantiles)

sdnorm_de <- pnorm(q = sample_quantiles, mean = mean(gdp$de), sd = sd(gdp$de))
sdnorm_uk <- pnorm(q = sample_quantiles, mean = mean(gdp$uk), sd = sd(gdp$uk))
sdnorm_us <- pnorm(q = sample_quantiles, mean = mean(gdp$us), sd = sd(gdp$us))
sdnorm_jp <- pnorm(q = sample_quantiles, mean = mean(gdp$jp), sd = sd(gdp$jp))
sdnorm_cn <- pnorm(q = sample_quantiles, mean = mean(gdp$cn), sd = sd(gdp$cn))

kde_all <- tibble(kde_de, kde_uk, kde_us, kde_jp, kde_cn,
                  sdnorm_de, sdnorm_uk, sdnorm_us, sdnorm_jp, sdnorm_cn)

\pagebreak

We can plot the KDE-derived cumulative distribution function against the one that is derived from the normal distribution. One could create a generic plotting function to safe some typing. $\$

kde_all %>%
  ggplot(aes(x = sample_quantiles)) +
  geom_line(aes(y = kde_de, color = "kde_de")) +
  geom_line(aes(y = sdnorm_de, color = "sdnorm_de")) +
  labs(title = "KDE vs Normal for DE", x = "Quantile", y = "P(x) \n")