ArgoCanada / argoFloats

Tools for analyzing collections of oceanographic Argo floats
https://argocanada.github.io/argoFloats/index.html
17 stars 7 forks source link

Accelerate subset process #578

Closed CongGao-CG closed 2 years ago

CongGao-CG commented 2 years ago

I tried to lots of subsetting by position and time conditions. I need to do it in two steps by using subset function twice. Since I need to do a lot selection, it is rather slow. And I tried two methods, i.e., for loop and apply. Surprisingly, apply function does not help accelerate the selection process. Could you give me some advice for quick selection.

My code for selection is shown below,

  index  <- subset(indexAll,
                   circle=list(longitude=xv[6],latitude=xv[7],radius=30))
  index  <- subset(index, time=list(from=t-ddays(2), to=t+ddays(15)))
dankelley commented 2 years ago

As a general statement about R, using apply() will only be faster than a loop if the loop overhead is a significant fraction of the total.

The circle subset requires computing a geodetic distance, which involves a lot of computation (see help(geodDist,package="oce") for references to the methodology). You may find that the rectangle method is faster, because it involves only simple comparison of numbers.

You might want to consider the order in which you are subsetting. This can speed things up, depending on the costs of the individual operations. Again, that's a general statement about computation, not a statement about the argoFloats package or about R.

Depending on your actual task, you might want to simply extract position and time from the index, and construct logical lookup arrays based on direct computations.

Also, consider using the argodata package.