djhocking / Trout_GRF

Application of Gaussian Random Fields in a Dendritic Network
3 stars 0 forks source link

Excessively high simulated abundance #8

Closed djhocking closed 8 years ago

djhocking commented 8 years ago

Occasionally, when simulating spatial or spatiotemporal data, the predicted abundance runs away and gets excessively high. Setting a small value for SD_input helps but it can still happen (~15,000 when mean N = 100 even with SD = 0.1). We need a good way to prevent the simulated abundance from getting out of control.

bletcher commented 8 years ago

Could density dep be added?

On Mon, Feb 1, 2016 at 10:44 PM, Daniel J. Hocking <notifications@github.com

wrote:

Occasionally, when simulating spatial or spatiotemporal data, the predicted abundance runs away and gets excessively high. Setting a small value for SD_input helps but it can still happen (~15,000 when mean N = 100 even with SD = 0.1). We need a good way to prevent the simulated abundance from getting out of control.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8.

Silvio O. Conte Anadromous Fish Research Center, U.S. Geological Survey P.O. Box 796 -- One Migratory Way Turners Falls, MA 01376 (413) 863-3803 Cell: (413) 522-9417 FAX (413) 863-9810

ben_letcher@usgs.gov bletcher@eco.umass.edu http://www.lsc.usgs.gov/?q=cafb-research

James-Thorson commented 8 years ago

Dan,

Abundance estimates become very large whenever catchability estimates become very small. Catchability estimates will approach zero whenever the average slope of catches plot against visit number is positive, and I think this is what's happening in a few replicates.

We're simulating a triple-pass depletion estimator, and we're imagining that sampling involves netting off a smallish stream reach right? In this case, we know a priori that catchability won't be less than, say, 0.1 (because samplers can't miss a huge density of fishes in a small spatial scale of a stream when using good sampling techniques). I suggest adding a lower bound to catchability, and discarding any replicate where catchability is estimated at this bound, where we then report the number of replicates that triggered this condition in the text.

Also: are you exploring the scenario where catchability varies among tows? I remember us exploring this in earlier stages of the project. If yes, please tell me and I'll elaborate on how I think we can bound site-level catchability in that case.

jim

On Tue, Feb 2, 2016 at 6:32 AM, Ben Letcher notifications@github.com wrote:

Could density dep be added?

On Mon, Feb 1, 2016 at 10:44 PM, Daniel J. Hocking < notifications@github.com

wrote:

Occasionally, when simulating spatial or spatiotemporal data, the predicted abundance runs away and gets excessively high. Setting a small value for SD_input helps but it can still happen (~15,000 when mean N = 100 even with SD = 0.1). We need a good way to prevent the simulated abundance from getting out of control.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8.

Silvio O. Conte Anadromous Fish Research Center, U.S. Geological Survey P.O. Box 796 -- One Migratory Way Turners Falls, MA 01376 (413) 863-3803 Cell: (413) 522-9417 FAX (413) 863-9810

ben_letcher@usgs.gov bletcher@eco.umass.edu http://www.lsc.usgs.gov/?q=cafb-research

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8#issuecomment-178599450.

djhocking commented 8 years ago

This is happening in the simulations of data during of the OU process when looping through the network to estimate x_b (in case of spatial) and x_bt in the case of spatiotemporal simulations. It seems like the OU process allows for considerable drift even if it only happens quite rarely. The result is that when I add it to the log_mean and use it in the exponent the value of lambda is occasionally huge resulting in N_i being huge. N_i = rpois( prod(dim(x_bt)), lambda=exp(x_bt + log_mean + eta_i)). This isn’t during estimation, just simulation, so the catchability hasn’t been applied yet. Again, if I make SD very small (~0.01) removes the problem for even large network simulations.

I’m referring to SD in condSD_b[i] = sqrt( SD^2/(2_theta) * (1-exp(-2_theta*family[i,'dist_b'])) )

Sorry if my use of the term SD_input threw you off since that’s actually the term used in the model estimating/fitting. I was just thinking of it that way because SD is the same parameter when simulating the data.

Would it be possible to add some sort of density dependence or other limiting term to

x_bt[i,] = x_bt[SimulatedNodes[Match],] + rmvnorm(1, mean=rep(0,n_years), sigma=condSD_b[i]^2 * Corr_tt^2 )[1,] #

It would be odd to add it later to N_i or lambda because x_bt[i] is dependent on x_bt[i-1] following the OU process. Since this isn’t a problem with the estimation, I could just keep a really small SD for the simulations and not use too large a network, so the chance of getting an outrageously high N_i estimate would be minimal.

Dan

On Feb 2, 2016, at 10:07 AM, Jim Thorson notifications@github.com<mailto:notifications@github.com> wrote:

Dan,

Abundance estimates become very large whenever catchability estimates become very small. Catchability estimates will approach zero whenever the average slope of catches plot against visit number is positive, and I think this is what's happening in a few replicates.

We're simulating a triple-pass depletion estimator, and we're imagining that sampling involves netting off a smallish stream reach right? In this case, we know a priori that catchability won't be less than, say, 0.1 (because samplers can't miss a huge density of fishes in a small spatial scale of a stream when using good sampling techniques). I suggest adding a lower bound to catchability, and discarding any replicate where catchability is estimated at this bound, where we then report the number of replicates that triggered this condition in the text.

Also: are you exploring the scenario where catchability varies among tows? I remember us exploring this in earlier stages of the project. If yes, please tell me and I'll elaborate on how I think we can bound site-level catchability in that case.

jim

On Tue, Feb 2, 2016 at 6:32 AM, Ben Letcher notifications@github.com<mailto:notifications@github.com> wrote:

Could density dep be added?

On Mon, Feb 1, 2016 at 10:44 PM, Daniel J. Hocking < notifications@github.commailto:notifications@github.com

wrote:

Occasionally, when simulating spatial or spatiotemporal data, the predicted abundance runs away and gets excessively high. Setting a small value for SD_input helps but it can still happen (~15,000 when mean N = 100 even with SD = 0.1). We need a good way to prevent the simulated abundance from getting out of control.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8.

Silvio O. Conte Anadromous Fish Research Center, U.S. Geological Survey P.O. Box 796 -- One Migratory Way Turners Falls, MA 01376 (413) 863-3803 Cell: (413) 522-9417 FAX (413) 863-9810

ben_letcher@usgs.govmailto:ben_letcher@usgs.gov bletcher@eco.umass.edumailto:bletcher@eco.umass.edu http://www.lsc.usgs.gov/?q=cafb-research

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8#issuecomment-178599450.

— Reply to this email directly or view it on GitHubhttps://github.com/djhocking/Trout_GRF/issues/8#issuecomment-178622996.

James-Thorson commented 8 years ago

I'd skip the density dependence, because we're not including that in the esitmation model, and then we'd have to deal with mis-matching estimation and simulation models as an explanation in the case of poor fit (I think we're using the simulation as a demonstration of consistency and small-sample properties, not a test of robustness to violated model assumptions). So yes, I recommend keeping SD low.

jim

On Tue, Feb 2, 2016 at 7:32 AM, Daniel J. Hocking notifications@github.com wrote:

This is happening in the simulations of data during of the OU process when looping through the network to estimate x_b (in case of spatial) and x_bt in the case of spatiotemporal simulations. It seems like the OU process allows for considerable drift even if it only happens quite rarely. The result is that when I add it to the log_mean and use it in the exponent the value of lambda is occasionally huge resulting in N_i being huge. N_i = rpois( prod(dim(x_bt)), lambda=exp(x_bt + log_mean + eta_i)). This isn’t during estimation, just simulation, so the catchability hasn’t been applied yet. Again, if I make SD very small (~0.01) removes the problem for even large network simulations.

I’m referring to SD in condSD_b[i] = sqrt( SD^2/(2_theta) (1-exp(-2_thetafamily[i,'dist_b'])) )

Sorry if my use of the term SD_input threw you off since that’s actually the term used in the model estimating/fitting. I was just thinking of it that way because SD is the same parameter when simulating the data.

Would it be possible to add some sort of density dependence or other limiting term to

x_bt[i,] = x_bt[SimulatedNodes[Match],] + rmvnorm(1, mean=rep(0,n_years), sigma=condSD_b[i]^2 * Corr_tt^2 )[1,] #

It would be odd to add it later to N_i or lambda because x_bt[i] is dependent on x_bt[i-1] following the OU process. Since this isn’t a problem with the estimation, I could just keep a really small SD for the simulations and not use too large a network, so the chance of getting an outrageously high N_i estimate would be minimal.

Dan

On Feb 2, 2016, at 10:07 AM, Jim Thorson <notifications@github.com<mailto: notifications@github.com>> wrote:

Dan,

Abundance estimates become very large whenever catchability estimates become very small. Catchability estimates will approach zero whenever the average slope of catches plot against visit number is positive, and I think this is what's happening in a few replicates.

We're simulating a triple-pass depletion estimator, and we're imagining that sampling involves netting off a smallish stream reach right? In this case, we know a priori that catchability won't be less than, say, 0.1 (because samplers can't miss a huge density of fishes in a small spatial scale of a stream when using good sampling techniques). I suggest adding a lower bound to catchability, and discarding any replicate where catchability is estimated at this bound, where we then report the number of replicates that triggered this condition in the text.

Also: are you exploring the scenario where catchability varies among tows? I remember us exploring this in earlier stages of the project. If yes, please tell me and I'll elaborate on how I think we can bound site-level catchability in that case.

jim

On Tue, Feb 2, 2016 at 6:32 AM, Ben Letcher <notifications@github.com mailto:notifications@github.com> wrote:

Could density dep be added?

On Mon, Feb 1, 2016 at 10:44 PM, Daniel J. Hocking < notifications@github.commailto:notifications@github.com

wrote:

Occasionally, when simulating spatial or spatiotemporal data, the predicted abundance runs away and gets excessively high. Setting a small

value for SD_input helps but it can still happen (~15,000 when mean N

100 even with SD = 0.1). We need a good way to prevent the simulated abundance from getting out of control.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8.

Silvio O. Conte Anadromous Fish Research Center, U.S. Geological Survey P.O. Box 796 -- One Migratory Way Turners Falls, MA 01376 (413) 863-3803 Cell: (413) 522-9417 FAX (413) 863-9810

ben_letcher@usgs.govmailto:ben_letcher@usgs.gov bletcher@eco.umass.edumailto:bletcher@eco.umass.edu http://www.lsc.usgs.gov/?q=cafb-research

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8#issuecomment-178599450.

— Reply to this email directly or view it on GitHub< https://github.com/djhocking/Trout_GRF/issues/8#issuecomment-178622996>.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8#issuecomment-178638697.

djhocking commented 8 years ago

Sounds good, thanks. What do you think about dividing all distances by 10 or 100, particularly for the estimation side of things so that theta isn't so small? SD and theta haven't been separable in past simulations but I thought if they weren't both so small it might help in that area.

James-Thorson commented 8 years ago

Theta should be scale invariant so dividing by a constant shouldn't affect estimation performance or speed particularly (except that starting minimzation at 0 might be closer or further from the optimal value).

jim

On Tue, Feb 2, 2016 at 8:11 AM, Daniel J. Hocking notifications@github.com wrote:

Sounds good, thanks. What do you think about dividing all distances by 10 or 100, particularly for the estimation side of things so that theta isn't so small? SD and theta haven't been separable in past simulations but I thought if they weren't both so small it might help in that area.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/8#issuecomment-178659506.

djhocking commented 8 years ago

Thanks