djhocking / Trout_GRF

Application of Gaussian Random Fields in a Dendritic Network
3 stars 0 forks source link

Added Years #3

Closed djhocking closed 8 years ago

djhocking commented 8 years ago

I just updated the data and White_River.R code to include all years of observed data. I didn't expand the dataframes to include all node-year combinations. The current model (version d) produces identical N_i estimates for each year at a given site.

djhocking commented 8 years ago

I guess that makes sense because I didn't include year as a covariate in this case, so the only thing that affects lambda, as currently coded, is length, width, and location. Then the detection process just effects how many are captured but not how many there are in a given year at a particular site (i.e. lambda for site is not currently coded to vary by year).

James-Thorson commented 8 years ago

Dan,

I just added temporal and spatiotemporal components to v1f. I also added input/output tools to turn off individual components, so you can do AIC to select among models. Seems like its working, but could you please do some simple sanity checks on it (i.e., check for smoothness, check whether sample correlation and estimated correlation match etc.)?

cheers, jim

On Mon, Aug 3, 2015 at 2:18 PM, Daniel J. Hocking notifications@github.com wrote:

I guess that makes sense because I didn't include year as a covariate in this case, so the only thing that affects lambda, as currently coded, is length, width, and location. Then the detection process just effects how many are captured but not how many there are in a given year at a particular site (i.e. lambda for site is not currently coded to vary by time).

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-127408504.

James-Thorson commented 8 years ago

Dan,

Have you had a chance to look at this? Any interest in talking before AFS? Just want to make sure you're not waiting on me for anything.

cheers, jim

On Mon, Aug 3, 2015 at 4:32 PM, James Thorson James.Thorson@noaa.gov wrote:

Dan,

I just added temporal and spatiotemporal components to v1f. I also added input/output tools to turn off individual components, so you can do AIC to select among models. Seems like its working, but could you please do some simple sanity checks on it (i.e., check for smoothness, check whether sample correlation and estimated correlation match etc.)?

cheers, jim

On Mon, Aug 3, 2015 at 2:18 PM, Daniel J. Hocking < notifications@github.com> wrote:

I guess that makes sense because I didn't include year as a covariate in this case, so the only thing that affects lambda, as currently coded, is length, width, and location. Then the detection process just effects how many are captured but not how many there are in a given year at a particular site (i.e. lambda for site is not currently coded to vary by time).

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-127408504.

djhocking commented 8 years ago

Hi Jim,

I looked through the code a little but haven’t had time to get back to in in ernest. I might need you to walk me through a couple parts of the code for the spatiotemporal model, so yes I would be interested in chatting before AFS (too bad I’m not going - maybe next year). I’m not really waiting on you for anything at this point though.

My biggest issue is that I don’t think the sampling used to collect the data match the inference we’re trying to make. This could lead to some interesting comparisons if we can find the data (or simulate from the data we do have). Most stream electrofishing samples that I’ve seen cover from 20-100 m stream lengths. However, with rocks, downed trees, pools, riffles, runs, seeps, and other microhabitat features, the abundance of fish varies hugely over short distances. Unfortunately for a variety of mostly good reasons, we don’t measure these variables. Therefore, when we use this model we get an estimated theta of ~400 (rho is essentially 0), which means that there is essentially no spatial autocorrelation after a couple meters. This shouldn’t be unexpected if 20-100 m stream lengths are our unit of inference. However, if we’re interested in the abundance (or density) for stream reaches from confluence to confluence (or 0.5 - 5 km stream lengths), I would expect there to be somewhat strong spatial autocorrelation.

I’ve been talking with another postdoc in the lab (Evan Childres) about looking for the survey length where average density levels off (how many meters of stream should be sampled to represent the stream as a “whole”). If we can figure this out we can look for watersheds where more intense sampling has been conducted with longer sections of streams sampled. In the Westbrook where Ben has ~16 years of data there are tons of 20m adjacent segments but we don’t have much of the network covered, mostly just the mainstem. It would be interesting to compare spatial correlation at micro (~20-50m) and meso (~400-1000m) scales to understand how populations are structured in networks. Evan Grant also has a stream network or two where he sample virtually the entire thing in 20 m segments so that is a potential data source using salamanders.

We’ve also thought about running the model with no covariates then adding them in so we can do some variance partitioning to see how much of the spatial correlation is described by things like forest cover and underlying geology.

Anyway, far more than you needed to know but it was good for me to get this written and recorded on github to refer back to.

My schedule is fairly flexible the rest of the week if you want to chat. Dan


Daniel J. Hocking

http://danieljhocking.wordpress.com/

On Aug 11, 2015, at 3:30 PM, Jim Thorson notifications@github.com<mailto:notifications@github.com> wrote:

Dan,

Have you had a chance to look at this? Any interest in talking before AFS? Just want to make sure you're not waiting on me for anything.

cheers, jim

On Mon, Aug 3, 2015 at 4:32 PM, James Thorson James.Thorson@noaa.gov<mailto:James.Thorson@noaa.gov> wrote:

Dan,

I just added temporal and spatiotemporal components to v1f. I also added input/output tools to turn off individual components, so you can do AIC to select among models. Seems like its working, but could you please do some simple sanity checks on it (i.e., check for smoothness, check whether sample correlation and estimated correlation match etc.)?

cheers, jim

On Mon, Aug 3, 2015 at 2:18 PM, Daniel J. Hocking < notifications@github.commailto:notifications@github.com> wrote:

I guess that makes sense because I didn't include year as a covariate in this case, so the only thing that affects lambda, as currently coded, is length, width, and location. Then the detection process just effects how many are captured but not how many there are in a given year at a particular site (i.e. lambda for site is not currently coded to vary by time).

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-127408504.

— Reply to this email directly or view it on GitHubhttps://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130031453.

James-Thorson commented 8 years ago

Dan,

happy to chat -- what about today?

for the problem you describe, it could just be that micro-variation (variation at smaller scales than the average distance between samples) swamps macro-variation (variation at scales between samples). If yes, then adding a IID lognormal variation to density at each location might fix it. You willing to take a crack at this?

cheers, jim

On Tue, Aug 11, 2015 at 7:11 PM, Daniel J. Hocking <notifications@github.com

wrote:

Hi Jim,

I looked through the code a little but haven’t had time to get back to in in ernest. I might need you to walk me through a couple parts of the code for the spatiotemporal model, so yes I would be interested in chatting before AFS (too bad I’m not going - maybe next year). I’m not really waiting on you for anything at this point though.

My biggest issue is that I don’t think the sampling used to collect the data match the inference we’re trying to make. This could lead to some interesting comparisons if we can find the data (or simulate from the data we do have). Most stream electrofishing samples that I’ve seen cover from 20-100 m stream lengths. However, with rocks, downed trees, pools, riffles, runs, seeps, and other microhabitat features, the abundance of fish varies hugely over short distances. Unfortunately for a variety of mostly good reasons, we don’t measure these variables. Therefore, when we use this model we get an estimated theta of ~400 (rho is essentially 0), which means that there is essentially no spatial autocorrelation after a couple meters. This shouldn’t be unexpected if 20-100 m stream lengths are our unit of inference. However, if we’re interested in the abundance (or density) for stream reaches from confluence to confluence (or 0.5 - 5 km stream lengths), I would expect there to be somewhat strong spatial autocorrelation.

I’ve been talking with another postdoc in the lab (Evan Childres) about looking for the survey length where average density levels off (how many meters of stream should be sampled to represent the stream as a “whole”). If we can figure this out we can look for watersheds where more intense sampling has been conducted with longer sections of streams sampled. In the Westbrook where Ben has ~16 years of data there are tons of 20m adjacent segments but we don’t have much of the network covered, mostly just the mainstem. It would be interesting to compare spatial correlation at micro (~20-50m) and meso (~400-1000m) scales to understand how populations are structured in networks. Evan Grant also has a stream network or two where he sample virtually the entire thing in 20 m segments so that is a potential data source using salamanders.

We’ve also thought about running the model with no covariates then adding them in so we can do some variance partitioning to see how much of the spatial correlation is described by things like forest cover and underlying geology.

Anyway, far more than you needed to know but it was good for me to get this written and recorded on github to refer back to.

My schedule is fairly flexible the rest of the week if you want to chat. Dan


Daniel J. Hocking

http://danieljhocking.wordpress.com/

On Aug 11, 2015, at 3:30 PM, Jim Thorson <notifications@github.com<mailto: notifications@github.com>> wrote:

Dan,

Have you had a chance to look at this? Any interest in talking before AFS? Just want to make sure you're not waiting on me for anything.

cheers, jim

On Mon, Aug 3, 2015 at 4:32 PM, James Thorson <James.Thorson@noaa.gov mailto:James.Thorson@noaa.gov> wrote:

Dan,

I just added temporal and spatiotemporal components to v1f. I also added input/output tools to turn off individual components, so you can do AIC to select among models. Seems like its working, but could you please do some simple sanity checks on it (i.e., check for smoothness, check whether sample correlation and estimated correlation match etc.)?

cheers, jim

On Mon, Aug 3, 2015 at 2:18 PM, Daniel J. Hocking < notifications@github.commailto:notifications@github.com> wrote:

I guess that makes sense because I didn't include year as a covariate in this case, so the only thing that affects lambda, as currently coded, is length, width, and location. Then the detection process just effects how many are captured but not how many there are in a given year at a particular site (i.e. lambda for site is not currently coded to vary by time).

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-127408504.

— Reply to this email directly or view it on GitHub< https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130031453>.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130136431.

djhocking commented 8 years ago

I like the idea of adding lognormal variation to density. I do think it’s micro variation swamping out macro variation. I could give it a try. I’m curious if it would be identifiable to model random variation in density and random spatial network variation. What do you think? Should it work in theory? Would it just take a ton of data?

We can chat later today or tomorrow.


Daniel J. Hocking http://danieljhocking.wordpress.com/


On Aug 12, 2015, at 11:23 AM, Jim Thorson notifications@github.com<mailto:notifications@github.com> wrote:

Dan,

happy to chat -- what about today?

for the problem you describe, it could just be that micro-variation (variation at smaller scales than the average distance between samples) swamps macro-variation (variation at scales between samples). If yes, then adding a IID lognormal variation to density at each location might fix it. You willing to take a crack at this?

cheers, jim

On Tue, Aug 11, 2015 at 7:11 PM, Daniel J. Hocking notifications@github.com<mailto:notifications@github.com

wrote:

Hi Jim,

I looked through the code a little but haven’t had time to get back to in in ernest. I might need you to walk me through a couple parts of the code for the spatiotemporal model, so yes I would be interested in chatting before AFS (too bad I’m not going - maybe next year). I’m not really waiting on you for anything at this point though.

My biggest issue is that I don’t think the sampling used to collect the data match the inference we’re trying to make. This could lead to some interesting comparisons if we can find the data (or simulate from the data we do have). Most stream electrofishing samples that I’ve seen cover from 20-100 m stream lengths. However, with rocks, downed trees, pools, riffles, runs, seeps, and other microhabitat features, the abundance of fish varies hugely over short distances. Unfortunately for a variety of mostly good reasons, we don’t measure these variables. Therefore, when we use this model we get an estimated theta of ~400 (rho is essentially 0), which means that there is essentially no spatial autocorrelation after a couple meters. This shouldn’t be unexpected if 20-100 m stream lengths are our unit of inference. However, if we’re interested in the abundance (or density) for stream reaches from confluence to confluence (or 0.5 - 5 km stream lengths), I would expect there to be somewhat strong spatial autocorrelation.

I’ve been talking with another postdoc in the lab (Evan Childres) about looking for the survey length where average density levels off (how many meters of stream should be sampled to represent the stream as a “whole”). If we can figure this out we can look for watersheds where more intense sampling has been conducted with longer sections of streams sampled. In the Westbrook where Ben has ~16 years of data there are tons of 20m adjacent segments but we don’t have much of the network covered, mostly just the mainstem. It would be interesting to compare spatial correlation at micro (~20-50m) and meso (~400-1000m) scales to understand how populations are structured in networks. Evan Grant also has a stream network or two where he sample virtually the entire thing in 20 m segments so that is a potential data source using salamanders.

We’ve also thought about running the model with no covariates then adding them in so we can do some variance partitioning to see how much of the spatial correlation is described by things like forest cover and underlying geology.

Anyway, far more than you needed to know but it was good for me to get this written and recorded on github to refer back to.

My schedule is fairly flexible the rest of the week if you want to chat. Dan


Daniel J. Hocking

http://danieljhocking.wordpress.com/

On Aug 11, 2015, at 3:30 PM, Jim Thorson notifications@github.com<mailto:notifications@github.com<mailto: notifications@github.commailto:notifications@github.com>> wrote:

Dan,

Have you had a chance to look at this? Any interest in talking before AFS? Just want to make sure you're not waiting on me for anything.

cheers, jim

On Mon, Aug 3, 2015 at 4:32 PM, James Thorson James.Thorson@noaa.gov<mailto:James.Thorson@noaa.gov mailto:James.Thorson@noaa.gov> wrote:

Dan,

I just added temporal and spatiotemporal components to v1f. I also added input/output tools to turn off individual components, so you can do AIC to select among models. Seems like its working, but could you please do some simple sanity checks on it (i.e., check for smoothness, check whether sample correlation and estimated correlation match etc.)?

cheers, jim

On Mon, Aug 3, 2015 at 2:18 PM, Daniel J. Hocking < notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:

I guess that makes sense because I didn't include year as a covariate in this case, so the only thing that affects lambda, as currently coded, is length, width, and location. Then the detection process just effects how many are captured but not how many there are in a given year at a particular site (i.e. lambda for site is not currently coded to vary by time).

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-127408504.

— Reply to this email directly or view it on GitHub< https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130031453>.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130136431.

— Reply to this email directly or view it on GitHubhttps://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130341341.

James-Thorson commented 8 years ago

Definitely should work in theory with limited data. It's equivalent to saying covariance is equal to spatial cov plus diagonal cov. I've done it before. Happy to look it over tomorrow

Sent from my phone

On Aug 12, 2015, at 1:02 PM, Daniel J. Hocking notifications@github.com wrote:

I like the idea of adding lognormal variation to density. I do think it’s micro variation swamping out macro variation. I could give it a try. I’m curious if it would be identifiable to model random variation in density and random spatial network variation. What do you think? Should it work in theory? Would it just take a ton of data?

We can chat later today or tomorrow.


Daniel J. Hocking http://danieljhocking.wordpress.com/


On Aug 12, 2015, at 11:23 AM, Jim Thorson notifications@github.com<mailto:notifications@github.com> wrote:

Dan,

happy to chat -- what about today?

for the problem you describe, it could just be that micro-variation (variation at smaller scales than the average distance between samples) swamps macro-variation (variation at scales between samples). If yes, then adding a IID lognormal variation to density at each location might fix it. You willing to take a crack at this?

cheers, jim

On Tue, Aug 11, 2015 at 7:11 PM, Daniel J. Hocking notifications@github.com<mailto:notifications@github.com

wrote:

Hi Jim,

I looked through the code a little but haven’t had time to get back to in in ernest. I might need you to walk me through a couple parts of the code for the spatiotemporal model, so yes I would be interested in chatting before AFS (too bad I’m not going - maybe next year). I’m not really waiting on you for anything at this point though.

My biggest issue is that I don’t think the sampling used to collect the data match the inference we’re trying to make. This could lead to some interesting comparisons if we can find the data (or simulate from the data we do have). Most stream electrofishing samples that I’ve seen cover from 20-100 m stream lengths. However, with rocks, downed trees, pools, riffles, runs, seeps, and other microhabitat features, the abundance of fish varies hugely over short distances. Unfortunately for a variety of mostly good reasons, we don’t measure these variables. Therefore, when we use this model we get an estimated theta of ~400 (rho is essentially 0), which means that there is essentially no spatial autocorrelation after a couple meters. This shouldn’t be unexpected if 20-100 m stream lengths are our unit of inference. However, if we’re interested in the abundance (or density) for stream reaches from confluence to confluence (or 0.5 - 5 km stream lengths), I would expect there to be somewhat strong spatial autocorrelation.

I’ve been talking with another postdoc in the lab (Evan Childres) about looking for the survey length where average density levels off (how many meters of stream should be sampled to represent the stream as a “whole”). If we can figure this out we can look for watersheds where more intense sampling has been conducted with longer sections of streams sampled. In the Westbrook where Ben has ~16 years of data there are tons of 20m adjacent segments but we don’t have much of the network covered, mostly just the mainstem. It would be interesting to compare spatial correlation at micro (~20-50m) and meso (~400-1000m) scales to understand how populations are structured in networks. Evan Grant also has a stream network or two where he sample virtually the entire thing in 20 m segments so that is a potential data source using salamanders.

We’ve also thought about running the model with no covariates then adding them in so we can do some variance partitioning to see how much of the spatial correlation is described by things like forest cover and underlying geology.

Anyway, far more than you needed to know but it was good for me to get this written and recorded on github to refer back to.

My schedule is fairly flexible the rest of the week if you want to chat. Dan


Daniel J. Hocking

http://danieljhocking.wordpress.com/

On Aug 11, 2015, at 3:30 PM, Jim Thorson notifications@github.com<mailto:notifications@github.com<mailto: notifications@github.commailto:notifications@github.com>> wrote:

Dan,

Have you had a chance to look at this? Any interest in talking before AFS? Just want to make sure you're not waiting on me for anything.

cheers, jim

On Mon, Aug 3, 2015 at 4:32 PM, James Thorson James.Thorson@noaa.gov<mailto:James.Thorson@noaa.gov mailto:James.Thorson@noaa.gov> wrote:

Dan,

I just added temporal and spatiotemporal components to v1f. I also added input/output tools to turn off individual components, so you can do AIC to select among models. Seems like its working, but could you please do some simple sanity checks on it (i.e., check for smoothness, check whether sample correlation and estimated correlation match etc.)?

cheers, jim

On Mon, Aug 3, 2015 at 2:18 PM, Daniel J. Hocking < notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:

I guess that makes sense because I didn't include year as a covariate in this case, so the only thing that affects lambda, as currently coded, is length, width, and location. Then the detection process just effects how many are captured but not how many there are in a given year at a particular site (i.e. lambda for site is not currently coded to vary by time).

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-127408504.

— Reply to this email directly or view it on GitHub< https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130031453>.

— Reply to this email directly or view it on GitHub https://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130136431.

— Reply to this email directly or view it on GitHubhttps://github.com/djhocking/Trout_GRF/issues/3#issuecomment-130341341.

— Reply to this email directly or view it on GitHub.

djhocking commented 8 years ago

Just adding for my reference (from email)


Dan,

I just fixed a small bug that was triggered when turning off spatial variation.

After a few runs, my interpretation is that spatiotemporal variation is occuring over an interesting spatial scale, and that temporal variation explains a small but significant (by AIC) portion of dynamics. Hwoever, spatial variation has a spatial scale that goes to zero, and therefore is confounded with new IID lognormal overdispersion.

There are two paths forward with this data set:

  1. Drop spatial variation and interpret spatiotemporal and pure-temporal variation
  2. Fix the spatial scale of spatial variation to the scale of spatiotemporal variation (i.e., specify thata = theta_sp). Then, if spatial SD still goes to zero, its not a big deal, and we can run with all three forms of variation, while just interpreting that spatiotemporal variation captures the spatial component sufficiently well.

cheers, jim

djhocking commented 8 years ago

Running with theta = theta_st for simulations and case study. All seems to be working okay currently.