MurrayEfford / secr

Spatially Explicit Capture-Recapture
3 stars 1 forks source link

Support exact integral for detectfn (where possible) #5

Open PhilJd opened 5 months ago

PhilJd commented 5 months ago

While working on #4 , I've been looking into other ways of speeding up the computation. One option would be to use the exact CDF instead of approximating the integral via Numer::Integrate where possible.

E.g.,the half normal function (zhhnrC) has an exact solution (found via WolframAlpha, this includes the y calculations).

This gives ~10x speedup, reducing computation time from 40 to ca. 4 minutes in my case. I've confirmed that results are the same. This is of course limited to convex polygons though.

Do you think this is worth adding? I'd be happy to contribute this if yes :)

MurrayEfford commented 5 months ago

Hello Phil Thanks for this work. It will be a couple of days before I get time to look at your suggestions, but they looks promising. Murray


From: Phil @.> Sent: 17 January 2024 08:41 To: MurrayEfford/secr @.> Cc: Subscribed @.***> Subject: [MurrayEfford/secr] Support exact integral for detectfn (where possible) (Issue #5)

While working on #4https://github.com/MurrayEfford/secr/pull/4 , I've been looking into other ways of speeding up the computation. One option would be to use the exact CDF instead of approximating the integral via Numer::Integrate where possible.

E.g.,the half normal function (zhhnrC) has an exact solution (found via WolframAlphahttps://www.wolframalpha.com/input?i=integral+p+*+exp%28-+%28sqrt%28%28x-m%29%5E2+%2B+n%29%29%5E2+%2F+%282+*+q%5E2%29%29+dx+from+a+to+b, this includes the y calculations).

This gives ~10x speedup, reducing computation time from 40 to ca. 4 minutes in my case. I've confirmed that results are the same. This is of course limited to convex polygons though.

Do you think this is worth adding? I'd be happy to contribute this if yes :)

— Reply to this email directly, view it on GitHubhttps://github.com/MurrayEfford/secr/issues/5, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACEB6PQAXAGWNNV23UE6GYTYO3JXJAVCNFSM6AAAAABB5LNW7GVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DINZRGQYDSOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

MurrayEfford commented 5 months ago

Yes, it would be good to switch to using the exact CDF when that is available. I haven't worked on the polygon case for years, and I'm not happy with it, as you can tell from the state of the code. It's so ragged I have even thought of quarantining it in a separate package (removing the dependence of secr on RcppNumerical and indirectly on RcppEigen), but it's easier for now to just continue.

PhilJd commented 5 months ago

Sorry for the late reply, I've been on vacation and tried to get a deeper understanding of the code before replying. Is my understanding correct that 1) detectfns are only evaluated on positive values (euclidean distances?) 2) Typically, this loop integrates the same function (which as far as I understand is parameterized by gsbval) many times with different boundaries?

What do you think of this approach (which would also get rid of RcppEigen and support non-convex polygons): 1) Implement a simple integration using the trapezoidal rule, which subdivides the intervals until the approximation error is smaller than a given threshold. 2) For a parameterized detectfn, compute the approximate integral for the max distance (using the function from 1)) and store the intermediate results. 3) Compute the cumulative sum. 4) Loop over all intervals to evaluate and compute the integral by reading off the values for upper and lower limits from the pre-computed cumulative sums.

I.e., the difference to the current state would be that we pre-compute and re-use the integral. If there are a number of intervals to evaluate for the same parameters, this should quickly amortize.

MurrayEfford commented 5 months ago

Hello Phil I appreciate your expertise on numerical methods and would encourage any contribution you can make. Answering your questions (with apologies if my memory or analysis is faulty)

1. Yes, detectfns are only evaluated on non-negative values. Non-euclidean distances are only allowed for 'point' (not polygon) detectors.

  1. The loop over k may be evaluated only once. nk is the number of discrete polygons, each nominally a 'detector' and different boundaries. There may be use cases I haven't seen in which many disjunct areas are searched, but most often nk = 1. The inner loop over m evaluates the integral once for each possible activity centre (AC) location.

The function stays the same for each AC location m, so perhaps your suggested algorithm can help. I'm not seeing how it would work in two dimensions with shifting AC location. I assume you envisage a 2-D version of the trapezoidal rule? Also could consider Gaussian quadrature. Murray


From: Phil @.> Sent: 29 January 2024 03:48 To: MurrayEfford/secr @.> Cc: Murray Efford @.>; Comment @.> Subject: Re: [MurrayEfford/secr] Support exact integral for detectfn (where possible) (Issue #5)

Sorry for the late reply, I've been on vacation and tried to get a deeper understanding of the code before replying. Is my understanding correct that

  1. detectfns are only evaluated on positive values (euclidean distances?)
  2. Typically, this loophttps://github.com/MurrayEfford/secr/blob/ad369069f6d4c733b120f9d7e061f3cd7d242297/src/makegkpoly.cpp#L71 integrates the same function (which as far as I understand is parameterized by gsbval) many times with different boundaries?

What do you think of this approach (which would also get rid of RcppEigen and support non-convex polygons):

  1. Implement a simple integration using the trapezoidal rule, which subdivides the intervals until the approximation error is smaller than a given threshold.
  2. For a parameterized detectfn, compute the approximate integral for the max distance (using the function from 1)) and store the intermediate results.
  3. Compute the cumulative sum.
  4. Loop over all intervals to evaluate and compute the integral by reading off the values for upper and lower limits from the pre-computed cumulative sums.

I.e., the difference to the current state would be that we pre-compute and re-use the integral. If there are a number of intervals to evaluate for the same parameters, this should quickly amortize.

— Reply to this email directly, view it on GitHubhttps://github.com/MurrayEfford/secr/issues/5#issuecomment-1913621722, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACEB6PXYL46TLJ3KLMVKLN3YQZQKTAVCNFSM6AAAAABB5LNW7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGYZDCNZSGI. You are receiving this because you commented.Message ID: @.***>