GLEON / rLakeAnalyzer

An R version of Lake Analyzer
43 stars 26 forks source link

vectorizing schmidt.stability() #39

Open shatwell opened 9 years ago

shatwell commented 9 years ago

Hi,

I would like to suggest how to vectorize the code in schmidt.stability(). The function takes a temperature profile as a vector and loops through each temperature and depth. Using matrix multiplication (%*%) does this all in one step. So you could replace this chunk

Zv = layerD * layerA * dz
Zcv = sum(Zv)/sum(layerA)/dz
numInt = length(layerA)
st = layerA * NaN
for (i in 1:numInt) {
  z = layerD[i]
  A = layerA[i]
  st[i] = -(Zcv - z) * layerP[i] * A * dz
}
St = g/Ao * sum(st)

with this chunk:

Zcv <- layerD %*% layerA / sum(layerA)
St <- layerP %*% ((layerD - Zcv) * layerA) * dz * g / Ao

This makes the function much faster, especially for deep lakes. Also, if you wanted to use the vectorized version and modify the interpolation sections, you can optionally take a matrix or data.frame instead of a vector for the input temps wtr without changing the equations above. So in this case wtr could be a whole time series of profiles instead of a single profile (i.e. m different times in the rows, n different depths in the columns, so that one profile is in each row). In the above code, layerD and layerA would still be nx1 vectors, but layerP would become an mxn matrix. Here's some code I have been using as an alternative (with safety checks removed):

schmidt.stability <- function(wtr, depths, bthA, bthD, sal = wtr * 0) {
  vect <- is.vector(wtr)
  if(vect) wtr <- t(as.matrix(wtr))
  g   <- 9.81
  dz  <- 0.1
  Zo  <- min(depths)
  Ao  <- bthA[which.min(depths)]
  Z   <- seq(Zo, max(bthD), dz)
  A   <- approx(x=bthD, y=bthA, xout=Z, rule=2)$y
  T   <- matrix(NA, nrow=nrow(wtr), ncol=length(Z))
  for(ii in 1:nrow(T)) T[ii,] <- approx(x=depths, y=wtr[ii,], xout=Z, rule=2)$y
  D   <- water.density(T, sal)
  Zcv <- Z %*% A / sum(A)
  St  <- D %*% ((Z - Zcv) * A) * dz * g / Ao
  if(vect) St <- as.numeric(St)
  return(St)
}

For a time series with 1100 profiles, the vectorized function above takes 0.7 seconds on my computer, the original function with loops takes 47 seconds.

Tom

lawinslow commented 9 years ago

Thanks Tom.

Quick question. Would the fully vectorized version be able to handle NA values? Currently, we git the user the option to filter for NA values on a per-timestep basis here. That is fairly important for potentially sparse data, and I kinda prefer doing that than interpolating the missing values.

-Luke

shatwell commented 9 years ago

Hi Luke,

yes as such the function at the bottom of the post works with NAs as long as there are at least 2 non NA values in each profile. The reason this works is because of the profile interpolation in approx() with rule=2. If you were to perform matrix multiplication on a matrix of profiles containing NAs (ie without interpolating), this would also still work, but it would return an NA result for each profile in the time series containing an NA. But I think this probably won't occur because the interpolation is built into schmidt.stability() (dz =0.1) and I think it is important to do it like this too.

I just saw that ts.schmidt.stability() does the whole time series too! Note that you could use the same function for both single profiles and whole time series with the matrix multiplication version. It just treats a single profile as a 1-row matrix.

Tom