ebuhle / LCRchumIPM

This is the development site for an Integrated Population Model for chum salmon in the lower Columbia River.
MIT License
4 stars 1 forks source link

Modeling the hatchery life cycle: broodstock to smolt #24

Open ebuhle opened 1 year ago

ebuhle commented 1 year ago

I wanted to document my thoughts on the broodstock-to-smolt stage of the hatchery life cycle while they're still fresh, so that we (including Future Me) can easily pick up where we left off later this year or early next year. I'm going to open a companion Issue at ebuhle/salmonIPM with an annotated pseudocode, because (as usual) this is relatively easy to describe but pretty gnarly to implement. I'd encourage you, especially @tbuehrens and @kalebentley, to read that Issue in tandem with this one. I'll link it here, but since the repo is private you'll need to be a collaborator to view it. @tbuehrens already is, but @kalebentley let me know if you'd like me to invite you. [Edit: it looks like the link to that Issue below this post only shows up if you're logged in and are a contributor to the salmonIPM repo.]

Let's start with a few key insights and proposed shifts in our modeling practices that allowed me to formulate this approach. First, a general disclaimer: the notation used below is subject to change without notice. There's a broader salmonIPM notational shakeup on the horizon, but for now I've tried to indicate where I intend to rename existing parameters.

Now let's turn to the critical new (or mostly new) assumptions underlying my proposed broodstock-to-smolt model.

I believe those are the biggies. Hopefully others can point out if I've missed something obvious.

OK, discuss!

kalebentley commented 1 year ago

@ebuhle. Thanks for the extremely detailed post! I gave it a thumbs-up simply to acknowledge its presence and that I quickly scanned it. That said, I'm gonna need to re-read and digest it at a later point in time. Maybe (just maybe) we can aim to work on the IPM this fall (as opposed to waiting until next spring) and can start back on this topic.

kalebentley commented 7 months ago

Hey @ebuhle,

In a long overdue reply, I have provided responses to the three (of the four) comments/questions you posted that were related to “the critical new (or mostly new) assumptions underlying [your] proposed broodstock-to-smolt model”. (NOTE: I haven’t looked at the companion Issue #54 you posted to the salmonIPM thread but also plan on reading through that sometime soon).

As I was reading through this post, I realized that there was some general information on the “hatchery/channel populations” in our current chum IPM that would be helpful to have summarized in one location. Also, it would be helpful to provide a brief overview on how/why we plan to use the chum IPM to evaluate the performance of these hatchery/channel populations to contextualize our efforts to model the "hatchery" life-cycle. Therefore, I'm going to write a follow-up response. There’s probably a better location for this information but for now, it'll be useful (at least for me) to have it here (and we can move it later if desired).

Ok - getting back to the assumptions...

1. The set of natural and hatchery/channel populations in the model is closed

This assumption is probably/mostly satisfied for Duncan Hatchery given that all broodstock source “populations” that are used for this program are in our datasets/model and subsequently monitored via spawning ground surveys (thus recruits from this program can be detected/estimated). There are probably some locations where hatchery recruits could/do return that aren’t monitored (well) but I would expect this to be pretty low within the designated population (area) of the Lower Gorge.

This assumption is not true for the Grays Hatchery program based on the way the data are compiled now but could probably be mostly true with updates to our data files. The biggest thing here is partitioning the broodstock that are collected and used to produce chum fry that are planted back into the Grays Basin vs. the broodstock that are collected and used to produce eggs/fry that are planted/shipped other locations (i.e., Big Creek Hatchery, Peterson RSI, and Skamokawa) that are not currently in our model. Based on the way Grays hatchery operations data have been collected, we probably cannot partition the adults and subsequent eggs/fry perfectly but can probably come up with something logical and consistent. Since the non-Grays locations are not in our IPM model (and thus do not meet the closure assumption), we should remove/ignore these outplants and come up with a way to denote the broodstock take as “loss” (perhaps something similar to imposing a fishery impact). The only last thing we’ll need to decide is what to do with the Big Creek Hatchery origin recruits that can (and do) show up in our data set. Specifically, there are 32 observations of Big Creek Hatchery origin adults in our BioData file across all years (compared to the 374 Grays hatchery adults). One option, for now, could be simply ignore them (i.e., pretend they are the same as a natural-origin) and acknowledge that our pHOS estimates for the Grays may be slightly underestimated. I’m sure we could come up with some sort of ad hoc analysis/summary to approximate how much pHOS is underestimated by ignoring these Big Creek Hatchery strays. I can’t think of another option but wondering if other can. Similar to what I highlighted above for Duncan Hatchery, there are certainly going to be locations/populations that Grays Hatchery recruits could return/stray to that are not in our current IPM model (e.g., Chinook, Elochoman). I’ve comfortable highlighting this limitation and living with it for now.

While this assumption is satisfied for the Lewis Hatchery program as it pertains to broodstock collection (which comes from I-205) but the assumption is not satisfied with regards to detecting/estimating Lewis Hatchery origin adults/recruits. Our model currently does not include a Lewis population because estimates of abundance have not been generated. Even if estimates were generated, I’m not certain what, if any, data have been collected to identify Lewis Hatchery origin adults in the Lewis Basin. Across all years in our BioData file, we have a total of 10 Lewis Hatchery origin adults (juvenile plants began in 2011). These are technically strays based on the objective of this program. As an aside, should revisit how we want to treat these hatchery strays in our hatchery evaluation. We included these recruits in our preliminary hatchery evaluation by lumping them with the I-205/Washougal population. While not technically wrong, our evaluation of Lower Gorge, I-205/Washougal, and Grays/Chinook is a bit inconsistent.

Similar to Duncan Hatchery, this assumption is probably/mostly satisfied for Duncan Channel. I know you know this but worth highlighting that this “population” is not a hatchery program but rather a population location that receives translocated adults that we can detected recruits using genetic stock identification. Duncan Channel has some nuances that make it unique and potentially not that “transferable” (e.g., all adults have to be translocated into Duncan Channel even if they are recruits from Duncan Channel; we don’t differentiate translocated adults that return back to the Duncan Creek trap and hypothetically would have spawned in Duncan voluntarily vs. adults that returned to mainstem spawning locations and would not have returned to Duncan had we not manually moved them in boat and truck). Nonetheless, it may be worth changing how the data are organized to make it more universal for other hypothetical translocation situations.

2. Broodstock collection is random

I don’t totally understand this assumption or rather why it is necessary. While broodstock collection should be random outside of the sex selectivity that you highlighted (and years early in the Grays dataset where the broodstock collection location increased the odds of collecting hatchery-origin adults), I don't see why we have to assume the demographics of broodstock is the same as the naturally spawning adults because the broodstock and naturally spawned adults have their own set of bio-data.

Also, I need some help interpreting the 2nd half of your last sentence, “…so we only need to keep track of the margins of that potential horrible 3-way contingency table.” I think I understand the “conditional” part of the sentence (i.e., we’ve been summarizing our bio-data using samples/fish that have all three components – origin, sex, and age – which accounts for potential interactions), but need some help with “keeping track of the margins”. What does this mean?

3. Observed states within the in-hatchery life-cycle and their corresponding observation errors

I’m punting this one for now. I need to figure out what information/data is collected during hatchery rearing to see what options we have here. I’ll circle back on this one shortly.

4. Hatchery egg-to-smolt survival is density-independent

Hmm…I would think that this is safe assumption but we should have some data to evaluate it.

ebuhle commented 7 months ago

Thanks for this @kalebentley, very helpful.

Re the closure assumption, it sounds like Lewis Hatchery may end up being the most problematic case, albeit in the direction (i.e., unobserved returns) that already affects the hatchery model in its current release-to-return form, as opposed to the direction (i.e., broodstock sent to unmodeled dispositions) that will specifically affect the broodstock-to-release component. Unobserved returns, whether from a hatchery or natural population, are indistinguishable from mortality and so will manifest as lower estimated SAR. IIRC, SAR for Lewis Hatchery in the existing model is on the lower side, so that tracks.

I know you know this but worth highlighting that this “population” is not a hatchery program but rather a population location that receives translocated adults that we can detected recruits using genetic stock identification. Duncan Channel has some nuances that make it unique and potentially not that “transferable” (e.g., all adults have to be translocated into Duncan Channel even if they are recruits from Duncan Channel; we don’t differentiate translocated adults that return back to the Duncan Creek trap and hypothetically would have spawned in Duncan voluntarily vs. adults that returned to mainstem spawning locations and would not have returned to Duncan had we not manually moved them in boat and truck).

Right, of course, but the relevant distinction here is between natural populations where spawners return and do their thing, and hatchery / channel populations where all spawners are deliberately collected and transferred in. The latter set of populations are also the ones whose origins are identifiable. Some of these (i.e., Duncan Channel) may then undergo natural reproduction while others (i.e., hatcheries) have artificial propagation. That's what I meant by disambiguating the categories of "transfer-recipient / known-origin vs. natural-return / unknown-origin" from "hatchery vs. natural reproduction", where the latter will be defined by the S-R function. These categories have been conflated in the model thus far because there's been no need to distinguish them, but the broodstock-to-smolt component changes that.

The process model (and the bio_data) does account for local self-recruitment to Duncan Channel as opposed to translocated adults from other locations, even though they intermingle on the spawning grounds.

2. Broodstock collection is random

I don’t totally understand this assumption or rather why it is necessary. While broodstock collection should be random outside of the sex selectivity that you highlighted (and years early in the Grays dataset where the broodstock collection location increased the odds of collecting hatchery-origin adults), I don't see why we have to assume the demographics of broodstock is the same as the naturally spawning adults because the broodstock and naturally spawned adults have their own set of bio-data.

The bio_data are the observations, whereas what's at issue here is the process model. The approach I'm proposing would predict the age-, sex-, and origin-composition of spawners in each hatchery / channel as a mixture of the respective source populations, weighted by the relative numbers of broodstock they contribute. In order for this to be valid, broodstock collection must be random w.r.t. those three demographic characteristics. If that assumption were seriously violated, then we would have to additionally estimate transition matrices (possibly time-varying) representing the "selectivity" of broodstock collection from each wild pop to each hatchery / channel w.r.t. each of the three demographics. As if that's not bad enough, it would get even uglier if there were multi-way statistical interactions among age, sex, origin, and disposition (local vs. broodstock). On that point...

Also, I need some help interpreting the 2nd half of your last sentence, “…so we only need to keep track of the margins of that potential horrible 3-way contingency table.” I think I understand the “conditional” part of the sentence (i.e., we’ve been summarizing our bio-data using samples/fish that have all three components – origin, sex, and age – which accounts for potential interactions), but need some help with “keeping track of the margins”. What does this mean?

Consider the 3-way contingency table of age, sex and origin, which is a way of summarizing the bio_data within a given population and year. In principle, there could be interactions up to order 3 among the margins of this table -- meaning, e.g., age is not statistically independent of sex, or the interaction between age and sex depends on origin, etc. In practice, we have always assumed independence (or close enough), which has allowed us to model age, sex and origin as unrelated processes and to construct the observation likelihood from the three marginal frequency distributions as opposed to the joint 3-way cross-tabulation. If this weren't the case, the model as it exists would be quite a bit uglier and more unwieldy. Way back when, I checked these assumptions against the bio_data and they looked reasonable. Now we need (OK, strongly prefer) to make an analogous assumption regarding the broodstock collection process, i.e. that the margins of a 4-way table with age, sex, origin, and disposition are mutually independent. I'm working on checking this one now; stay tuned...

ebuhle commented 7 months ago
  • Broodstock collection is random with respect to age, sex and origin, therefore adult recruits from a given pop (i.e. origin, whether identifiable or not) carry their demographics along with them when they are transferred / translocated from their return location to another pop (i.e., disposition).

I made some quick and dirty plots to check this random-sampling assumption in the subset of populations that were broodstock donors. Age and sex look fine; there are a few statistically significant discrepancies between the age distributions of adults taken as broodstock and those left to reproduce naturally, but the differences are small and overall there's no systematic bias.

Origin, coded here as known or unknown where the former is a proxy for hatchery / channel and there is typically only one or at most two such origins present in a given population, is more problematic. Broodstock taken from Grays River (recorded as Grays MS, although as we've discussed, this is not always accurate) are disproportionately Grays Hatchery origin. Some other populations show a trend in the same direction, albeit much weaker.

I'm not sure what to do about these patterns at this stage. We would certainly prefer to start with the simplifying assumption of random broodstock sampling in any case, so I guess this is just a heads-up to pay attention to origin-frequencies when doing posterior predictive checking on the broodstock-to-smolt model once we get it built and working.

While making these plots, I realized that bio_data doesn't include any hatchery locations. We'll need to get those into the data set before we can proceed with fitting a broodstock-to-smolt version of the IPM. Maybe this is what @kalebentley meant by

As I was reading through this post, I realized that there was some general information on the “hatchery/channel populations” in our current chum IPM that would be helpful to have summarized in one location.

I also looked into this assumption:

  • Hatchery egg-to-smolt survival is density-independent.

With the caveat that these are observations not states, and that ignoring observation error will tend to overestimate the strength of density dependence (but presumably both $S^\text{obs}$ and $M^\text{obs}$ are more precise in hatcheries), it does indeed look like production in Duncan Hatchery and Grays Hatchery is density-independent. By contrast, the natural populations generally show Ricker-type log-linear density dependence. Lewis Hatchery, however, seems to be the exception.

Is there any obvious reason why Lewis Hatchery would show density-dependent fry production, whereas the other hatcheries do not?

kalebentley commented 7 months ago

Hey @ebuhle,

Thanks for pulling these summaries together. I wanted to quickly respond to the pattern in origin composition you highlighted in your last post and specifically for Grays River....

Origin, coded here as known or unknown where the former is a proxy for hatchery / channel and there is typically only one or at most two such origins present in a given population, is more problematic. Broodstock taken from Grays River (recorded as Grays MS, although as we've https://github.com/ebuhle/LCRchumIPM/issues/18#issuecomment-1603007854, this is not always accurate) are disproportionately Grays Hatchery origin. Some other populations show a trend in the same direction, albeit much weaker.

Below is a plot of pHOS (percentage hatchery origin spawners) and pHOB (percentage hatchery origin broodstock) for the Grays Basin (MS, WF, and CJ combined) and the Grays River Hatchery, respectively. I grabbed these estimates from an HGMP (Hatchery Genetics Management Plan) document that Brad Garner compiled last year. The estimates of pHOS shown here probably deviate ever so slightly from the IPM estimates but should be very close... image

I want to show that the pattern you highlighted (i.e., "Broodstock taken from Grays River...are disproportionately Grays Hatchery origin."), which pooled results across all years, is likely a result of exceptionally high pNOB levels in the first few years of the hatchery program (2004-2006) and one recent year (2017). I vaguely remember @Hillsont explaining that the high pNOB levels observed in the early years were attributed to the broodstock collection location, which was modified and appears to be "working" given that since 2007, pNOB and pHOS have averaged 5.5 and 5.8, respectively. Overall, I think the assumption of randomized broodstock collection (i.e., representative of the donor stocks) is met concerning Grays River and specifically Origin. At the moment, I cannot speak to observed patterns at Hamilton Channel or Horsetail but those are pretty small proportions and likely equate to pretty small numbers of actual fish collected for broodstock.

As for your last question...

Is there any obvious reason why Lewis Hatchery would show density-dependent fry production, whereas the other hatcheries do not?

...I can't say without more "digging" but it's worth noting that the "Lewis Hatchery" and "Duncan Hatchery" are essentially one in the same. That is, broodstock collected for the two "programs" are reared at the same facilities though kept in separate rearing troughs. The point being - I don't why one "program" would exhibit density dependence and not the other. It may be worth discussing with @Hillsont and Brad as to what might be going on here but my first thought is that this pattern is spurious.

ebuhle commented 7 months ago

I vaguely remember @Hillsont explaining that the high pNOB levels observed in the early years were attributed to the broodstock collection location

Yeah, I remember that too; @Hillsont actually mentions it in the post I linked. Nice to see it illustrated with data. I agree this temporal perspective is reassuring, insofar as the random-sampling assumption appears valid at the Grays Basin level after the first three years. Unfortunately, when it comes to retrospective fitting we can anticipate a significant lack of fit to those 2005-2006 observations. They will have high leverage due to the tight contours of the multinomial likelihood, which in turn may induce hard-to-predict biases in other components of the model. We'll just have to keep an eye on it.

Also worth bearing in mind that given the available data, we can only model Grays Basin broodstock as if they were all taken from Grays MS. That may ironically work in our favor, because Grays MS experienced a much more pronounced spike in $p_\text{HOS}$ in 2004-2006 than was seen at the basin level as in that HGMP figure.

I can't say without more "digging" but it's worth noting that the "Lewis Hatchery" and "Duncan Hatchery" are essentially one in the same. That is, broodstock collected for the two "programs" are reared at the same facilities though kept in separate rearing troughs.

Oh, I somehow didn't know that! Well, I like your suggestion that the pattern is spurious. :+1:

ebuhle commented 7 months ago

@tbuehrens, I'm wondering if you have any thoughts about these issues raised in the OP regarding the observation errors to use for modeling hatchery spawner and fry / smolt abundance:

  • For hatchery spawner abundance, we just need to determine what tau_S_obs for each observation (possibly invariant) should be. I've always felt that hatchery / channel spawner abundance, like B_take_obs, was essentially known without error and that modeling it as uncertain was a compromise with reality. But there must be some uncertainty, right? Fish get lost or double-counted, eggs get spilled, etc. Do we have any basis for estimating / guesstimating its magnitude? One starting point would be the fact that we currently apply the aforementioned lognormal penalty with SD = 0.05 to B_take_obs. Now we're shifting the observed state from broodstock to hatchery spawners, so a similar SD could apply. The awkward part is, as I've noted before, that penalty SD is actually on par with our "real" sample-based estimates of tau_S_obs. Another reference point is that we currently treat tau_S_obs for Duncan Channel as unknown (since reported values are 0) and impute it. This is undesirable behavior (see similar issue with tau_M_obs), so perhaps whatever logic we use for hatchery spawners should apply to Duncan Channel too?

  • I am perfectly willing to believe that hatchery smolt releases really are measured with error! The question is just how to quantify it. There must be some sort of methodological basis we could use, but I defer to those more familiar with these programs for ideas.

kalebentley commented 5 months ago

@tbuehrens - moving this topic back to the top of your email.