Open earthcli opened 4 years ago
Hi earthcli, this is Michail. I saw the function in "zoo". The linear interpolation uses the "approx" function from the package "stats". But, I did not understand what it does. Do you know how the algortihm work? If you give me a hand then I can give you hand.
@statlink The linear interpolation for NA approx is very simple, see here http://www.cplusplus.com/forum/general/168623/ or here https://www.ajdesigner.com/phpinterpolation/linear_interpolation_equation.php. I have wrote a simple Rcpp function as below, however, it is better to work with both matrix and data.frame.
// [[Rcpp::export]]
double cpplinterp_( NumericVector x, NumericVector y, double xout, const int maxgap=99999, const bool locf=false){
// Linear approximation of a vector-function
// x, y vectors giving the coordinates of the points to be interpolated.
// x is assumed to be strictly monotonic.
// xout points at which to interpolate.
int n_x=x.size();
if( xout < x[0] ){
if(locf){
return y[0];
}else{
return NA_REAL;
}
}
if( xout >= x[n_x-1] ){
if(locf){
return y[n_x-1];
}else if(xout>x[n_x-1]){
return NA_REAL;
}else{
return y[n_x-1];
}
}
int idx = 0 ;// Find the location of xout in x
while ( x[idx] <= xout && idx < n_x ){
idx++ ;
}
// Find the nonNA location of y around xout within maxgap
int leftjust=1;
int rightjust=0;
while (NumericVector::is_na(y[idx-leftjust]) && idx-leftjust > 0 ){
leftjust++;
}
while (NumericVector::is_na(y[idx+rightjust]) && idx+rightjust < n_x ){
rightjust++ ;
}
if(x[idx+rightjust] - x[idx-leftjust] -1<=maxgap){
double theta = ( xout - x[idx-leftjust] ) / ( x[idx+rightjust] - x[idx-leftjust] ) ;
return theta * y[idx+rightjust] + ( 1 - theta ) * y[idx-leftjust] ;
}else{
return NA_REAL;
}
}
// [[Rcpp::export]]
NumericVector cpplinterp( NumericVector x, NumericVector y, NumericVector xout, const int maxgap=99999, const bool locf=false){
NumericVector yout=clone(xout);
for(int i=0;i<yout.size();i++){
yout[i]=cpplinterp_(x,y,xout[i],maxgap,false);
}
if(locf){
for(int i=0;i<yout.size();i++){
if(NumericVector::is_na(yout[i])){
int idx=1;
while (i+idx< yout.size()-1 && NumericVector::is_na(yout[i+idx]) && x[i+idx]-x[i] < maxgap){
idx++ ;
}
yout[i]=yout[i+idx];
}
}
for(int i=yout.size()-1;i>=0;i--){
if(NumericVector::is_na(yout[i])){
int idx=1;
while (i-idx>0 && NumericVector::is_na(yout[i-idx]) && x[i]-x[i-idx] < maxgap){
idx++ ;
}
yout[i]=yout[i-idx];
}
}
}
return yout;
}
// [[Rcpp::export]]
NumericVector cppmatrixapprox(NumericMatrix x, const int maxgap=99999, const bool locf=false, Rcpp::Nullable<Rcpp::NumericVector> idx=R_NilValue){
// x is generally the indice of columns, assumed to be strictly monotonic.
// idx points at which to interpolate.
NumericMatrix yout=clone(x);
NumericVector xout;
if(idx.isNull()){
xout=seq(1,x.ncol());
}else{
xout=idx;
}
for(int nr=0;nr<yout.nrow();nr++){
if(locf){
NumericVector rowx(x.ncol());
rowx=cpplinterp(xout,yout(nr,_),xout,maxgap,locf);
for(int nc=0;nc<yout.ncol();nc++){
if(NumericVector::is_na(yout(nr,nc))){
yout(nr,nc)=rowx[nc];
}
}
}else{
for(int nc=0;nc<yout.ncol();nc++){
if(NumericVector::is_na(yout(nr,nc))){
yout(nr,nc)=cpplinterp_(xout,yout(nr,_),xout[nc],maxgap,locf);
}
}
}
}
return yout;
}
Hi earthcli. You will have to give me some time. I will see what I can do in R, but bear in mind that I am very busy this month the maintainer is not currently available. In any case, we will try.
Hi earthcli, I cannot understand what is going on. Send me an e-mail at mtsagris@yahoo.gr to discuss this further there. Give me an example of your data, of a vector how they look like to make me understand your situation as right now I can only guess how your data look like and my guesses are obviously wrong.
Hi , Thanks for your consideration. My data is as below. Supposing the raw vector x is : 1.3 -0.3 NA 1.3 0.4 -1.5 NA -0.3 Generally, the first NA is approx by the left (-0.3) and right (1.3), the second NA is estimated approximately by -1.5 and -0.3, and I want to do this operation in a matrix or data.frame by row. for the first NA: Yout=Y1+(Y2-Y1)×(Xout-X1)/(X2-X1) 0.5=-0.3+(1.3--0.3)*(3-2)/(4-2)
The xout or idx parameter in the cpplinterp or cppmatrixapprox is the coordinate of x location, some times it may be not consecutive value, depending on the users. Some times there may be two consecutive NA, the algorithm has to find the non-NA value around the the NA that need to be estimated. For a vector, the x coordinate (xout) parameter generally is the index of x: seq_along (x) For the matrix, the x coordinate (xout/idx) parameter generally is the index of columns: seq_len(ncol(matrix)), but it may be defined by the user.
set.seed(0)
x=round(rnorm(8),1)
x[c(3,7)]=NA
zoo::na.approx(x)
cpplinterp(x=seq_along(x),y=x,xout=seq_along(x))
matrix=t(x)
matrix=matrix[rep(1,3),]
t(apply(matrix,1,zoo::na.approx))
cppmatrixapprox(matrix,idx=seq_len(ncol(matrix)))
set.seed(0) x=round(rnorm(8),1) x[c(3,7)]=NA zoo::na.approx(x) [1] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3 cpplinterp(x=seq_along(x),y=x,xout=seq_along(x)) [1] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3
matrix=t(x) matrix=matrix[rep(1,3),] matrix [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 1.3 -0.3 NA 1.3 0.4 -1.5 NA -0.3 [2,] 1.3 -0.3 NA 1.3 0.4 -1.5 NA -0.3 [3,] 1.3 -0.3 NA 1.3 0.4 -1.5 NA -0.3
t(apply(matrix,1,zoo::na.approx)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3 [2,] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3 [3,] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3 cppmatrixapprox(matrix,idx=seq_len(ncol(matrix))) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3 [2,] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3 [3,] 1.3 -0.3 0.5 1.3 0.4 -1.5 -0.9 -0.3
I am finding a fast enough function such as zoo::na.approx, see here I want to do a simple linear interpolation for NA values of each row of a big matrix (52017455 x 150), generally I can do this by using the na.approx function in zoo package with apply function, however, this is too slow. I wonder whether the Rfast package is helpful for me, or could you add a function such as rowapprox in the new version.