ami-iit / element_human-action-intention-recognition

8 stars 0 forks source link

Add covariance of the prediction to RNN #11

Closed kouroshD closed 4 years ago

kouroshD commented 4 years ago

In this issue, I would like to explore the state of the art for adding the possibility to not only predicting the future motion but also the covariance of the prediction, i.e., how much we can trust to the predicted time series. The first idea is coming from a talk given by Davide Scaramuzza given in Erzelli some months back, whereby they estimate the uncertainty of the prediction.

kouroshD commented 4 years ago

In this comment, I provide a brief description of the literature in for predicting the uncertainty. I will describe based on the following paper:

Consider we have the pairs of input ($x$) and output(target $t$) vectors data $(x, t)$, we have: $$ (1) t_i(x)= f_i(x) + e_i(x); i: 1, ..., N $$ which $t_i(x)$ is the measured target, $e_i(x)$ is the noise, and $f(x)$ is the true regression. $e_i(x)$ is identically and independently distributed. The true regression mean $\hat{y}_i$ is estimated by NN as following $$ (2) \hat{y}_i = \phi (x_i;T ) $$ where $T$ is the training set and $\phi$ is the nonlinear mapping. Using (1) and (2) we have: $$ (3) t_i(x)- \hat{y}_i= f_i(x) - \hat{y}_i + e_i(x) $$ Prediction interval identifies the uncertainty associated with the difference between the measured values and predicted values, i.e., relating the probability distribution $P(t_i| \hat{y}_i)$.

There are four different methods to approximate the PI:

Delta Method

Let's consider $ y_i =f(x_i, \omega^{\ast}) $ where $ \omega^{\ast} $ is the set of optimal weights.

In its neighborhood we will have

$$ (4) \hat{y}_i = f(x_i, \omega^{\ast})+ g_i^T (\hat{\omega} - \omega^{\ast}) $$

in which $g_i = \frac{f(x_i, \omega^{\ast})}{d \omega^{\ast}} $ using equation (3) and (4) we have: $$ t_i(x)- \hat{y}_i= (y_i + e_i(x)) -(f(x_i, \omega^{\ast})+ g_i^T (\hat{\omega} - \omega^{\ast})) = e_i(x) - g_i^T (\hat{\omega} - \omega^{\ast}) $$

There is an unmentioned assumption here, in which $y_i=f(x_i, \omega^{\ast})$ . To me this assumption is an approximation, may not be valid. So, we will have:

$$ var(t_i(x)- \hat{y}_i)= var(e_i(x)) + var( g_i^T ( \hat{\omega} - \omega^{\ast} ) ) $$

Elaborating this, we will have $(1-\alpha)$% PI:

$$ \hat{y}_i { \pm } t^{n-p, 1 - \frac{\alpha}{2}} \sqrt{ 1 + g_i^T (F^T F)^{-1} g_i } $$

where $F$ is the Jacobian matrix of the NN model and $t^{n-p, 1 - \frac{\alpha}{2}}$ is the $\frac{\alpha}{2}$ quarentile of a cumulative t-distribution function of $n-p$ DoFs.

Mean-Variance Estimation (MVE) Method

image

In this case, we assume a normal distrusted error around $yi$; we can identidy the following cost function error: $$ C{MVE}= \frac{1}{2} \sum_{i=1}^{n} \left [ ln( \hat{\sigma}_i^2 ) + \frac{ (t_i -\hat{y}_i)^2 }{ \hat{\sigma}_i^2 } \right ] $$

Three phases training technique has been proposed. First we identify the wights of network to identify the wights $\omegay$ to estimate the outputs using an error-based cost function. Later, finding $\omega{\sigma}$, by minimizing the cost function introduced before. Finally, resampling the and apply simultaneously adjustment of both network parameters suing previous introduced cost function. The drawback is it assumes the NN finds the true mean of the targets, i.e., $y_i$. Finally, it finds the covariance of the noise in formula (1).

Bootstrap Method the most common metrics. Its schematic is as following:

image

$B$ training dataset are resampled from the original dataset, and mean and covariance of the outputs are found out, i.e., $\hat{y}i$ and $\sigma^2{ \hat{y}_i }$ of the i'th sample.

To construct the PI, the variance of the errors is calculated using formula 1, i.e., $\sigma^2_{\hat{\epsilon}_i }$:

$$ \sigma^2{\hat{\epsilon}} \simeq E( (t-\hat{y})^2 ) - \sigma{\hat{y}^2} $$

Then, we define a similar cost function of MVE to estimate the values of $\sigma^2_{\hat{\epsilon}_i }$.

The last method to approximate is the Bayesian method, which is described in the paper.

kouroshD commented 4 years ago

Differently from method described in the previous comment, one point to consider is that, our problem is a time series problem. We define our problem as: Predicting the uncertainty between the predicted data from their truth values, i.e., minimzing the following objective function: $$ \hat{y}^{t} = \phi (x^0| TrainingSet ); 1<t<T $$ where $\hat{y}^{t}$ is the predicted target vector at time $t$.

We identify the uncertainty of the prediction at time $t$ using MSE as follows: $$ s^t = (y^t- \hat{y}^{t})^T (y^t - \hat{y}^{t}) $$ And the objective will be: $$ \hat{s} = arg min \frac{1}{2} \sum{t=1}^{T} \sum{i=1}^{M} ( s^{t,i} - \hat{s}^{t,i} ) $$ where $i$ is the sample number and $t$ is the time in future we want to predict. By summing over all the samples at time $t$ , the output approximate the average of the MSE at each time $t$, i.e., estimate of the covariance at each moment $t$.

However, we do not have $\hat{y}^{t}$ in advance. To resolve this problem, I was thinking of two different approaches:

First approach, using RNN to compute the $\hat{s}^t$ is interesting, since covariance at each moment $t$ will be a function of the hidden state at previous time $a^{t-1}$, similar to update methods in Gauss-Markov stochastic process.

P.S. similar to bootstrap method, we use exponential function in the output layer, to enforce the positive values for $s^t$.

kouroshD commented 4 years ago

@DanielePucci @raffaello-camoriano Let me know what do you think about the proposed approach wrt short literature review provided.

kouroshD commented 4 years ago

The implementation of what I have stated in the previous comment can be found in this commit https://github.com/dic-iit/element_human-action-intention-recognition/commit/3070f161837635003df2a60438905b0678391bb0 .

kouroshD commented 4 years ago

Using the test has been done in paper Nix94, in the experiment I have tried to predict the covariance of a dataset. So I have generated the following dataset using amplitude-modulation equation: $$ f(x)=m(x) sin(\omega{c} x);m(x)= sin(\omega{m} x); $$ The output $y$ is: $$ y= f(x) + n(x) $$ where $n(x)$ is the zero-mean Gaussian noise with the variance $\sigma^2(x)$ according to: $$ \sigma^2(x) = 0.02 + 0.02 \times (1-m(x))^2 $$ In the experiments we consider $\omega_c =5$ and $\omega_m =4$.

kouroshD commented 4 years ago

In case, the prediction of time series is perfect(!), it will predict the $f(x)$. So, in this experiment, I do not consider the prediction of $y$ and I only use $f(x)$ and $y$ to compute the covariance and try to predict it. In these experiments, I have considered the derivative of $y$ is as well, with the similar Gaussian noise distribution of $y$. Here is how it looks like the data set.

Figure_6

Figure_7

And Following is the input to the network:

Figure_4

Figure_5

And here are the result of the network:

Learning curve:

Figure_1

The real outputs and the estimated one:

Figure_2

Figure_3

kouroshD commented 4 years ago

I have done a new test, this time using the output of the predicted output and the truth values to compute the uncertainty of the prediction. I used following the parameters mentioned in this comment: https://github.com/dic-iit/element_human-action-intention-recognition/issues/11#issuecomment-614447712 . The output0 in the following figures are the $y$ and output1 is the derivative of $y$. This time the covariance of $\dot{y}$ is half of the covariance of $y$. The network for predicting the output:

the learning curve: pred-LearningCurve

The outputs of the test set:

pred-output0 pred-output1

The network for predicting the uncertainty of the prediction:

the learning curve: uncer-LearningCurve

The output of the network to predict the uncertainty:

uncer-testSet_pred-output0 uncer-testSet_pred-output1

Here are some additional figures:

Training Dataset (original data):

TrainingSet-outpu0

TrainingSet-outpu1

Prediction of the training DataSet:

TrainingSet-prediction-outpu0 TrainingSet-prediction-outpu1

training data set for uncertainty computed from precious ones:

uncer-trainingSet-output0

uncer-trainingSet-output1

kouroshD commented 4 years ago

@DanielePucci What do you think? I think we can close this issue.

claudia-lat commented 4 years ago

CC @DanielePucci

DanielePucci commented 4 years ago

Sorry for being late on this. Please @kouroshD call for a meeting (1.5h, next week) in the meanwhile you can close it