DistanceDevelopment / mrds

R package for mark-recapture-distance-sampling analysis
GNU General Public License v3.0
4 stars 4 forks source link

Computation time for mrds full independence fit of crabeater seals #3

Open erex opened 9 years ago

erex commented 9 years ago

Running mrds analysis of crabeaters (3480 detections) with this set of covariates

ddf.1<-ddf(method='fiio',dsmodel=~cds(key='hn'),mrmodel=~glm(link='logit',formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + distance:obsname  + distance:observer + vis
),data=ddf.dat.1,meta.data=list(width=700))

has now been running for 43 minutes. I believe comparable previous computation time was ~10sec

Perhaps the comment you made on the weekend about "now MRDS does the actual integration..." might be responsible??

dill commented 9 years ago

The following seems to only take ~90s on my computer:

library(mrds)
crabseal <- read.csv("http://distancesampling.org/R/vignettes/crabbieMRDS.csv")
ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + distance:obsname  + distance:observer + vis
), data=crabseal, meta.data=list(width=700))

BUT, I'm not sure whether this matches the data in the Distance project. Can you send me the project file?

erex commented 9 years ago

On 14/07/2015 14:46, DL Miller wrote:

The following seems to only take ~90s on my computer:

library(mrds) crabseal <- read.csv("http://distancesampling.org/R/vignettes/crabbieMRDS.csv") ddf.1 <- ddf(method='io.fi',dsmodel=~cds(key='hn'),mrmodel=~glm(link='logit',formula=~observer+distance+ gscat + fatigu e+ obsname + exp + ssmi + side + distance:obsname + distance:observer + vis ),data=crabseal,meta.data=list(width=700))

BUT, I'm not sure whether this matches the data in the Distance project. Can you send me the project file?

— Reply to this email directly or view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/3#issuecomment-121241885.

still running the job started when you were in my office.

Eric Rexstad 20 West Braes Crescent Crail Fife KY10 3SY

erex commented 9 years ago

On 14/07/2015 14:46, DL Miller wrote:

The following seems to only take ~90s on my computer:

library(mrds) crabseal <- read.csv("http://distancesampling.org/R/vignettes/crabbieMRDS.csv") ddf.1 <- ddf(method='io.fi',dsmodel=~cds(key='hn'),mrmodel=~glm(link='logit',formula=~observer+distance+ gscat + fatigu e+ obsname + exp + ssmi + side + distance:obsname + distance:observer + vis ),data=crabseal,meta.data=list(width=700))

BUT, I'm not sure whether this matches the data in the Distance project. Can you send me the project file?

— Reply to this email directly or view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/3#issuecomment-121241885.

the .csv file on the Distancesampling website has 3480 sightings; I'm guessing that's a positive indication

Eric Rexstad 20 West Braes Crescent Crail Fife KY10 3SY

dill commented 9 years ago

Ah okay, so perhaps the issue actually lies in calling summary on the fitted object. Looking into this now, but it seems to take a long time but is instantaneous if se=FALSE:

> system.time(summary(ddf.1))
    user   system  elapsed
1117.631   20.050 1160.934
> system.time(summary(ddf.1,se=FALSE))
   user  system elapsed
  0.011   0.001   0.012
dill commented 9 years ago

I rolled back to afa62b9355a7a549f93d6c1e7e2c9cd6ad6bb92e and the summary still takes a long time.

It's therefore possible that:

  1. this model didn't get fitted last time,
  2. it did get fitted but the run time went unnoticed,
  3. there has been some change in R or other underlying code that is causing the slowdown.

Any thoughts?

erex commented 9 years ago

Running D6 with MRDS 2.14 produces the "right" answer for the FI crabeater example.

Re-running D7 with MRDS 2.14 now also produces the "right" answer capture

no explanation for the crazy result from yesterday

dill commented 9 years ago

Unfortunately the speed-up I envisioned is not so trivial for this model do to the interactions (I think the fiddly programming required to do those computations will not pay off in terms of an actual speed-up). But in the case where only distance is included:

ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + vis), data=crabseal, meta.data=list(width=700))

I knocked off a few hundred seconds:

Before:

> system.time(summary(ddf.1))
   user  system elapsed
670.128   5.819 688.171

After:

> system.time(summary(ddf.1))
   user  system elapsed
453.256   9.457 468.547

This is in a separate branch at 49da34a. I'll think about this further.

dill commented 9 years ago

Using profvis as a guide I was able to hand-optimise some of the computation in lnl.io in b24d40f. Specifically previously p.io was making a call to model.matrix each time it was called (which was multiple times for optimHess).

Previously:

> system.time(ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + distance:obsname  + distance:observer + vis), data=crabseal, meta.data=list(width=700)))
   user  system elapsed
121.203   3.082 128.472

Now:

> system.time(ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + distance:obsname  + distance:observer + vis), data=crabseal, meta.data=list(width=700)))
   user  system elapsed
 47.742   0.736  50.015

There is still, however, a bottleneck in integration in the predict methods (when getting derivatives of abundance). This is still causing long computation times for summary.