Open erex opened 9 years ago
The following seems to only take ~90s on my computer:
library(mrds)
crabseal <- read.csv("http://distancesampling.org/R/vignettes/crabbieMRDS.csv")
ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + distance:obsname + distance:observer + vis
), data=crabseal, meta.data=list(width=700))
BUT, I'm not sure whether this matches the data in the Distance project. Can you send me the project file?
On 14/07/2015 14:46, DL Miller wrote:
The following seems to only take ~90s on my computer:
library(mrds) crabseal <- read.csv("http://distancesampling.org/R/vignettes/crabbieMRDS.csv") ddf.1 <- ddf(method='io.fi',dsmodel=~cds(key='hn'),mrmodel=~glm(link='logit',formula=~observer+distance+ gscat + fatigu e+ obsname + exp + ssmi + side + distance:obsname + distance:observer + vis ),data=crabseal,meta.data=list(width=700))
BUT, I'm not sure whether this matches the data in the Distance project. Can you send me the project file?
— Reply to this email directly or view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/3#issuecomment-121241885.
still running the job started when you were in my office.
Eric Rexstad 20 West Braes Crescent Crail Fife KY10 3SY
On 14/07/2015 14:46, DL Miller wrote:
The following seems to only take ~90s on my computer:
library(mrds) crabseal <- read.csv("http://distancesampling.org/R/vignettes/crabbieMRDS.csv") ddf.1 <- ddf(method='io.fi',dsmodel=~cds(key='hn'),mrmodel=~glm(link='logit',formula=~observer+distance+ gscat + fatigu e+ obsname + exp + ssmi + side + distance:obsname + distance:observer + vis ),data=crabseal,meta.data=list(width=700))
BUT, I'm not sure whether this matches the data in the Distance project. Can you send me the project file?
— Reply to this email directly or view it on GitHub https://github.com/DistanceDevelopment/mrds/issues/3#issuecomment-121241885.
the .csv file on the Distancesampling website has 3480 sightings; I'm guessing that's a positive indication
Eric Rexstad 20 West Braes Crescent Crail Fife KY10 3SY
Ah okay, so perhaps the issue actually lies in calling summary
on the fitted object. Looking into this now, but it seems to take a long time but is instantaneous if se=FALSE
:
> system.time(summary(ddf.1))
user system elapsed
1117.631 20.050 1160.934
> system.time(summary(ddf.1,se=FALSE))
user system elapsed
0.011 0.001 0.012
I rolled back to afa62b9355a7a549f93d6c1e7e2c9cd6ad6bb92e and the summary
still takes a long time.
It's therefore possible that:
Any thoughts?
Running D6 with MRDS 2.14 produces the "right" answer for the FI crabeater example.
Re-running D7 with MRDS 2.14 now also produces the "right" answer
no explanation for the crazy result from yesterday
Unfortunately the speed-up I envisioned is not so trivial for this model do to the interactions (I think the fiddly programming required to do those computations will not pay off in terms of an actual speed-up). But in the case where only distance is included:
ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + vis), data=crabseal, meta.data=list(width=700))
I knocked off a few hundred seconds:
Before:
> system.time(summary(ddf.1))
user system elapsed
670.128 5.819 688.171
After:
> system.time(summary(ddf.1))
user system elapsed
453.256 9.457 468.547
This is in a separate branch at 49da34a. I'll think about this further.
Using profvis
as a guide I was able to hand-optimise some of the computation in lnl.io
in b24d40f. Specifically previously p.io
was making a call to model.matrix
each time it was called (which was multiple times for optimHess
).
Previously:
> system.time(ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + distance:obsname + distance:observer + vis), data=crabseal, meta.data=list(width=700)))
user system elapsed
121.203 3.082 128.472
Now:
> system.time(ddf.1 <- ddf(method='io.fi', dsmodel=~cds(key='hn'), mrmodel=~glm(link='logit', formula=~observer+distance+ gscat + fatigue+ obsname + exp + ssmi + side + distance:obsname + distance:observer + vis), data=crabseal, meta.data=list(width=700)))
user system elapsed
47.742 0.736 50.015
There is still, however, a bottleneck in integration in the predict
methods (when getting derivatives of abundance). This is still causing long computation times for summary
.
Running mrds analysis of crabeaters (3480 detections) with this set of covariates
has now been running for 43 minutes. I believe comparable previous computation time was ~10sec
Perhaps the comment you made on the weekend about "now MRDS does the actual integration..." might be responsible??