ebuhle / LCRchumIPM

This is the development site for an Integrated Population Model for chum salmon in the lower Columbia River.
MIT License
4 stars 1 forks source link

Duncan Spawning Channel - North, South, and Combined #5

Closed kalebentley closed 1 year ago

kalebentley commented 4 years ago

Last week, Eric and I had a long phone call to discuss the chum IPM and we spent a fair amount of time talking about the Duncan Creek (i.e., Duncan Springs aka Duncan Spawning Channels) data set. The discussion was spurred by the first round of IPM results where there were clearly some issues and particularly the estimates of freshwater survival seemed unrealistically high. We quickly figured out the Duncan Springs adult data we being "wrangled" incorrectly. Without going into those details, this issue should be easily corrected. However, as I further explained the Duncan dataset and many (but likely not all) of its nuances, Eric and I came to a "crossroads" as to how the Duncan dataset should be evaluated moving forward.

In short, we came up with two general questions for the group:

  1. Should we treat the Duncan Springs/Channel as a single dataset (i.e., ignore that the Duncan Channel is broken up into two sub-areas; north and south) or is it important to sub-divided the datasets based on research questions and necessary assumptions? Note - right now the adult data set is not organized in a way that allows the analysis to sub-divide Duncan into North and South
  2. Depending on Q1 (above), do we need to continue sub-dividing Duncan Spawning Channel or the channel be monitored and analyzed as a single "unit"?

Below I've tried to provide both some background on the Duncan Channels as well as some very simple summaries of the existing data. NOTE: I wrote this quickly and thus the following text is quite verbose. Read as much or as little as you desire.

For those who are not familiar with the Duncan project, the questions listed above may not make sense so I'll provide a bit of background. In many ways, the Duncan data set is unlike any other chum data set in the Columbia Basin. First and foremost, all adults are manually transported to the spawning area including those that return to the Duncan trap located at the mouth of Duncan Creek. Additionally, the Duncan Spawning Channels are broken into two areas -- north and south (but see notes below) -- which receive female spawners at different reproductive stages. Specifically, each year, one of the two channels is designated to only received "green" females (i.e., females that haven't spawned based on their hard bellies and intact fins) while the other channel can receive "partials" (i.e., females that are either ripe - ready to spawn - or appear to have partially spawn but still believed to have >75% of eggs present based on professional opinion). The channel that receives green females more-or-less alternates each year. The purpose of only placing green females into one of the two channels is so that egg-to-fry survival can be (more) accurately estimated by more accurately estimating the number of eggs deposited in the gravel. By alternating channels, the performance of each "sub-channel" can be monitored and determined if maintenance is needed. In earlier years (BY01 - 05) only green females were used. However, because green females are generally more difficult to "acquire", partials have been used since BY06 to achieve higher spawner abundances in the channels. It should be noted that green females are placed in the partial spawning channel for multiple reasons (see plots below).

As for the Duncan Spawning Channel itself, I'll provide a bit more background that may be useful. Chum salmon were extirpated from Duncan Creek sometime around 1960 when a dam was constructed at the outlet of this creek. In 2000, a fish ladder was installed at the mouth allowing chum salmon access to historical spawning grounds. In September 2001, spawning channels were constructed in the historical spawning area to provide off-channel spawning habitat for chum salmon in Duncan Creek. The spawning channels have gone through several rounds of construction and updates. Todd provided me a long history of the channels and their updates in an email yesterday, which I can share if interested, but here's a brief summary:

For reference, here's an image of the original Duncan Channel configuration (BY2001 - 2007): Duncan_01-07 Here's a Google Earth image from August 2011 of the re-worked channel configuration (BY2008 - 2010). The blue line highlights the old "middle" channel: image Here's a Google Earth image from March 2016 of the current Duncan Channel configuration (BY2011 - present): Duncan_11-present

Using the Google Earth image of the current configuration, the south channel is ~920’ and the north channel is ~200’.

According to Todd, the majority (>80%??) of spawning in the South channel occurs in the most recently constructed portion of the channel (west of the little arm that runs north/south) and very little occurs in the portion of the South (and middle) channel that has existed since 2001. This may partially explain why the productivity of the in the South channel seems so much lower in the

In an effort to better understand the existing datasets, I summarized and generated a couple of plots that may or may not be helpful regarding our questions above.

The first plot shows the number of chum fry outmigrants as a function of female spawners (ignore the specific channels - north or south). The green points signify "green female" years and the blue points signify "partial female" years. Both juvenile (fry) and adult data are essentially censuses in most years. When I squint at the plot, I can potentially see fewer juveniles produced when partial females are used versus green females, which is what I would expect to see
image

However, when I plotted the number of chum fry outmigrants as a function of female spawners by channel (north vs. south), I no longer see a pattern where the number of outmigrants in "partial years" is lower relative to "green years". This seems counter-intuitive. However, as I mentioned above, green females can be placed in the partial channel and partial females are still supposed to be >75% full of eggs. image

Therefore, I was curious about what percentage of females that were placed in each channel were designated as either green (or ripe - eggs coming out but believed to be mostly unspawned) vs. partial. The following plot is similar to the one above except the size of the point is relative to the percentage of spawners that were either green/ripe (i.e., smaller point= more partials). This plot has fewer data points because of the way the chum data are stored - in short, data from BY2012 through 2019 are in a database that is easily queried while earlier years are in spreadsheets. I didn't have time to summarized data from spreadsheets. Nonetheless, what I see is that the north channel has received pretty much fully fecund females each year since BY 2012, even in "partial years" while the south channel (I think by chance) has received a much higher proportion of partials in the partial years. As aside, it also looks like the north channel's capacity could be somewhere around 35K. However, assuming equal spawning distribution and the expected area per redd (2 m2 per female), neither channel should be anywhere close to capacity... image

Here's a summary table that includes raw counts of females spawners by brood year (2012 - 2019), channel (north, south), spawner types (green, partial), where the spawners came from, and the reproductive status of each female (G=green, R=Ripe, P=Partial). I'm not sure what's up with the partials being used in the (south) green channel in 2015 and 2017. If needed, Todd and I can look into this later. image

Ok - given all of this information, where does this leave us? Here are a couple of options:

  1. Ignore partials and assume all females are fully fecund based on our fecundity by age dataset. This will obviously bias our estimate of egg-to-fry survival low but if the number of partials used is relatively low or the overall proportion of partials is similar among years then the trend in survival shouldn't be impacted too terribly much. At this point, I don't see the purpose of separating North and South channels.
  2. Designate each partial fish as a relative number of females (e.g., each partial fish counts as 0.75 females). Or perhaps this could be estimated based on our existing dataset? Similar to above, I don't see a point in breaking up the channel if we go this route.
  3. Treat North and South channels as "independent" populations when estimating egg-to-fry survival.

One other thing I haven't done should be sooner-than-later is to compare the estimated egg-to-fry survival rates from the green channel years to the current chum IPM where we are treating Duncan as a single spawning unit and ignoring partials. If the estimates are relatively similar then maybe that tells us that it is unnecessary to sub-divide the monitoring and analysis. However, if there are dissimilar, it would warrant further "analysis" and discussion.

ebuhle commented 4 years ago

Hi @kalebentley, thanks for this very informative post. The maps were especially helpful for me.

I'm not in a position to answer your Q1, which as you note depends in part on research questions. But from the modeling perspective, I endorse this suggestion, which would make a hairy problem as tractable as possible:

  1. Ignore partials and assume all females are fully fecund based on our fecundity by age dataset. This will obviously bias our estimate of egg-to-fry survival low but if the number of partials used is relatively low or the overall proportion of partials is similar among years then the trend in survival shouldn't be impacted too terribly much. At this point, I don't see the purpose of separating North and South channels.

Note that it would still be possible to recover the "true" egg deposition and egg-to-fry survival, if necessary, by post-processing the posterior draws to adjust for the "effective" number of females in each year.

By contrast, option 2 (explicitly modeling partial females) would make the problem even hairier, and would further complicate forecasting because you'd have to specify the effective number of females in advance:

  1. Designate each partial fish as a relative number of females (e.g., each partial fish counts as 0.75 females). Or perhaps this could be estimated based on our existing dataset? Similar to above, I don't see a point in breaking up the channel if we go this route.

On the data-wrangling front, I think I figured out how to get the raw data -- in general, but specifically for Duncan Channel / Duncan Creek -- into a model-ready format. I'll detail the main points in issue #3, but @kalebentley's post raises a new question / concern:

  • BYs 2008-10: South and Middle channel re-worked and connected to the top of north channel creating one single channel split by a monitoring weir (south monitoring weir) with another monitoring weir at the bottom (north monitoring weir)

  • BYs 2011- present: the channel was extended adding more spawning habitat above the south monitoring weir

Do I understand this, and the map, to mean that from outmigration year 2009-present the North weir is counting outmigrants from both South and North channels? So South channel fry are double-counted? For the juvenile data, I'm currently using disposition=="Duncan_Channel" and discarding disposition %in% c("Duncan_North", "Duncan_South"). The former is pretty close to the sum of the latter two, although the equality is not exact in all years. "Duncan_North" is less than "Duncan_South" in some years, which makes me think South channel fry are not double-counted. "Duncan_North" and "Duncan_South" are also missing one year (2014) when "Duncan_Channel" has data. Let me know if "Duncan_Channel" does not represent the total fry output from the channel(s), considered as a single unit.

kalebentley commented 4 years ago

Hey @ebuhle, Since no one else has chimed in yet, I will make an executive decision (at least for now) to go with Option 1 (i.e., Ignore partials and assume all females are fully fecund based on our fecundity by age dataset).

I am interested to better understand what you mean by:

that it would still be possible to recover the 'true' egg deposition and egg-to-fry survival, if necessary, by post-processing the posterior draws to adjust for the "effective" number of females in each year

Also, I am wondering how the estimates of egg-to-fry survival from Duncan, which will likely be biased high when we assume all translocated females were fully fecund, will influence (hierarchical?) estimates at other locations/populations.

In regards to the juvenile estimates from Duncan Channel. I am 99.9% certain that "Duncan_North" and "Duncan_South" are independent estimates and the summation of the two should equal the entire juvenile outmigration denoted as Location.Reach == "Duncan_Channel". It is my understanding that fry that are captured at the "Duncan_South" trap box are processed and released somewhere below the "Duncan_North" trap (i.e., "Duncan_South" fry are NOT double-counted). As you noted, the summation of "Duncan_North" and "Duncan_South" are pretty close to "Duncan_Channel" but not exactly in all years.

In reviewing the juvenile dataset for Duncan, there are 16 estimates of abundances for "Duncan_Channel" of which:

As a side note, there was an additional year (outmigration year 2017) where monitoring was initialed but both traps "failed" mid-season (became submerged by Bonneville backwater allowing juveniles to bypass trap). At this time, no estimates are available for this year but may be sometime soon.

@Hillsont or @BradGarnerWDFW, do you have any idea why the summation of the "Duncan_North" and "Duncan_South" estimates do not equal the estimated for "Duncan_Channel"` in all years? In theory, the summation of individual mean estimates should equal the total.

Lastly, I would still like to continue the discussion of my previously stated questions:

  1. Should we treat the Duncan Springs/Channel as a single dataset (i.e., ignore that the Duncan Channel is broken up into two sub-areas; north and south) or is it important to sub-divided the datasets based on research questions and necessary assumptions? Note - right now the adult data set is not organized in a way that allows the analysis to sub-divide Duncan into North and South
  2. Depending on Q1 (above), do we need to continue sub-dividing Duncan Spawning Channel or the channel be monitored and analyzed as a single "unit"?

For now, the answer to Q1 is to them as a single dataset (at least for our current analysis) but still need to discuss the ramifications of this decision for desired research questions. Based on this discussion, I am curious about what people think of Q2 and whether this has any impacts monitoring efforts this fall.

kalebentley commented 3 years ago

In light of the recent conversations and results pertaining to density dependence and freshwater survival (see Issue #6), I thought I would revisit the Duncan Channel dataset to try and help make sense of the results @ebuhle posted here and here.

Specifically, why are the estimates of "apparent" FW survival for chum fry in the Duncan Spawning Channels wonky looking and lower than other locations (especially given that we expected Duncan fry to have some of the higher relative survivals )?

As I highlighted in my first comment on this issue thread here, the Duncan Spawning Channels are unlike almost all other sites/populations that we have monitoring data from because:

When I first saw the outputs of "apparent" FW survival from Duncan Creek the other day, I was a bit surprised by how low they all were because I had generated similar plots before and recalled seeing some years with much higher survival. However, I realized @ebuhle was only plotting the combined FW survival of the entire Duncan Channel while I had only ever looked at plots where the data were stratified by the North and South Channel.

To try to make sense of @ebuhle outputs, I re-generated plots of FW survival (observed smolts -- M0 -- divided by potential egg deposition -- E_hat). A couple of side notes (though probably not super important for now):

First, I plotted FW survival by combining spawner/E_hat data for the north and south channels and indeed the resulting "apparent" FW survivals do seem to be"real" (not an artifact of incorrect data summarization). image

Second, I plotted FW survival by separating the dataset by Channel (North & South): image

When I squint at the above plot, I see two potential patterns:

After I showed this plot to Todd the other day, he asked if I've partitioned the data before and after the last channel renovation (again, when the amount of habitat in the south channel more than doubled). I hadn't and so I re-generated the above plot by stratifying pre (2002-2010) and post (2011 - present) south channel modification: image This plot is a bit harder to interpret with all of the groupings but I see a few patterns:

In addition to everything I've outlined above, the Duncan Channels are "intensively" monitored and so we have more information on things that may be contributing to the patterns we have seen year-to-year. Pairing this detailed information does help explain why the observed patterns may deviate from expectations. However, how we go about characterizing these "events", their relative significance, and how we deal with them all is a little unclear to me at this point but here's a few ideas:

Ok - that's enough for now. I am still working through the "TEMP_fish_data_egg-to-smolt" summary table to flag data that may be contributing to suspicious estimates of FW survival. I've incorporated my brief notes about Duncan in this file and will share soon.

Have a great weekend everyone.

ebuhle commented 3 years ago

Once again, thanks @kalebentley for this super informative data spelunking. It's taken me a minute to digest, but I think you propose some good solutions and next steps.

  • account for the change in spawning habitat (the same as we will once I finally get estimates of spawning habitat compiled -- should be very soon). Aside from the habitat restoration projects, there have also been some severely low water years that have impacted the total amount of spawning habitat. While this is complicated enough, spawning in the channels is extremely concentrated so the effects of changes in spawning habitat are going to vary depending on what portion of the channel was affected.

I agree, at a minimum accounting for the channel expansion(s) will surely make a difference of some kind. This is a more critical case for including the area offset because it's time-varying within the population, vs. just standardizing across populations. The pre-expansion curve does look a bit steeper than post-expansion, which is what you'd expect.

  • somehow account for partial spawners (see @ebuhle comment above)

On further reflection, I take back what I wrote above. I'd be fine with including a "fecundity offset" analogous to the habitat offset A; and likewise, for forecasting you could just make up values (or compare scenarios). The trouble is going to be "calculating" it from the data. What weight should a partial female get, if she retains somewhere between 75% and 100% of her eggs?

My much less preferred options would be:

  • partition the Duncan dataset into North and South channels and only use the data points for channels with "green" females. One limitation to this is that there's a couple of years where partitioning will not be possible because either juveniles or adults form the two channels mixed due to flooding or trap failure.

I'm reluctant to throw out data, barring some known egregious error. And separating Duncan_North and Duncan_South would be a nontrivial problem because there are no distinct returns to each "population", so you'd need some sort of downstream_trap-like construct to combine their spawners, that would only be used for this special case. (Although I guess they could anchor the straying matrix with a pairwise distance of zero!)

  • censor the dataset further based on some criteria (e.g., the partial spawner years that seem to have really affected the apparent FW survival)

I like the idea of a continuous offset more than a sharp cutoff to exclude data.

  • ignore all of this detail -- it is what it is and a true reflection of how the artificial spawning channel has been performing. I see the main downside to this choice is that these data are influencing estimates for other populations despite perhaps not being super representative.

Agreed. Given the small number of pops with smolt data, having one that's systematically biased will definitely affect the hyperdistribution.

ebuhle commented 3 years ago

On further reflection, I take back what I wrote above. I'd be fine with including a "fecundity offset" analogous to the habitat offset A; and likewise, for forecasting you could just make up values (or compare scenarios). The trouble is going to be "calculating" it from the data. What weight should a partial female get, if she retains somewhere between 75% and 100% of her eggs?

I'm ready to give this a shot, but I have another question: which data file contains the information on partial females? I would have thought it would be in the raw version of bio_data, but I'm not seeing anything.

kalebentley commented 3 years ago

Hey @ebuhle, I have summarized "partial" female data two different ways (1) partials used (Y/N) by Duncan Channel (N/S) and year and (2) percentage/absolute number of translocated females that were partials by Duncan Channel and year. I have a complete time-series (BY2002-2018) of the first summary but not the second (only BY2012-2018). Let's chat soon (tomorrow?) about how to tackle this issue.

kalebentley commented 3 years ago

Ok - I took another look at the Duncan spawner data to see if I could make any more sense of "partial" spawners and figure out a way to adjust the data to make it more comparable to the other datasets.

Looking back at the Duncan dataset, the first thing I noticed is that I had made a mistake in how I summarized females by "condition" (green, ripe, or partial) in my original post here. Basically, I had accidentally reserved the "ripe" and "partial" columns. However, because there have actually been almost no adults classified as "partial" across all years (14 out of 785 total spawners, 2001-2019), the patterns of my previous summaries remain the same (except that when we think of or see the term "partial" it really means "ripe").

After catching my previous mistake, I went back through all years of data and was able to categorize every female that has ever been transported into the Duncan Channels as either green, ripe, or partial (the previous post only included BY 2012-2018 as these data are in our TWS database while pre-2012 are not).

Similar to my previous post, I summarized the data by calculating potential egg deposition (E_hat; females * eggs per female) and FW survival (fry estimate divided by E_hat). For simplicity, I assigned all females a fecundity of 2,750 eggs.

This first plot (shown below) is FW survival as a function of potential egg deposition. Each data point is for a specific channel (North, South) and brood year (BY). As before, you can clearly see that, in general, the channel that received "partials" (again, read this as "ripes") has lower apparent survival and in many of these years almost all of the spawners were "ripe" (as denoted by the small points). image

As I pointed out last time, via Todd's suggestion, some of this pattern may be borne out of the specific spawning channel (north vs. south) and the fact that the south channel was expanded in the middle of our time series. This can sort of be seen in the plots below (pre = 2001 - 2010, post = 2011 - 2019). But given the minimal overlap in the potential egg deposition n the pre- vs. post-restoration time frames, it's not totally obvious to me how much this is affecting the pattern. image

Here's a plot of FW survival vs. potential egg deposition when data from the two channels are combined for a given brood year. The most obvious thing to me here is that there are no longer any FW survival estimates ~75% or higher but otherwise the estimates don't look too terribly different -- there's a cluster around 50% and then as PED increases, FW decreases down to ~30%. But there is still clearly some effect of habitat area and/or "partials/ripes"... image

So again, what should we do? As @ebuhle pointed out previously here, what value do we use for a "fecundity offset" if we want to discount non-green females? My initial thought was to let the data tell us by accounting for spawning area and seeing if we could fit estimate an offset for the "fixed effect" channel type (partial vs. green). I haven't taken a shot at this yet but explored it "visually" by assigning partial/ripe females a "discount" of 70% (eggs per female * discount). This value is almost entirely arbitrary.

Nonetheless, here's the same plot as the first one shown above but apply the 70% discount to non-green females (NOTE: I didn't change any of the other formatting -- so the size of the points is still proportional to the percentage of green females in each channel and year). I realize it is going to be challenging to compare the two plots in this thread but in general the discount did make the "partial" (blue) dots a bit more "reasonable" but visually the shape of the smoothed log-normal line doesn't seem to have changed much. image

Here's the companion plot when the data are combined for the two channels (and again discounting partial/ripe females by 70%). Again here, individual data points have shifted a bit but the overall pattern hasn't drastically changed. Though perhaps the estimate of psi would be quite different between the two plots - it's hard for me to tell visually? image

Based on this slightly updated information and barring updated results from @ebuhle after accounting for spawning habitat area, my inclination now is to ignore the fact that some of the females that were transported were potentially, and in some cases almost certainly, not 100% full of eggs. While ignoring this may (slightly?) underestimate FW survival in Duncan channel, I'm not convinced it is/will dramatically change the estimate. Rather, it seems to me that the FW survival in Duncan Channels just isn't as high as we originally thought it was/would be.

I've posted the dataset I used to generate the summaries of Duncan channel females to GitHub in the "data" folder if we want to explore this further.

Hillsont commented 3 years ago

@kalebentley thanks for continuing to explore this issue. I do want to say that BYs 2002, 2004, and 2006 should probably not be included, especially if channels are combined, unless you've corrected them in some way that's not evident to me.

BY2002 had adults removed from the channel after being released when the water dried up in late November. We estimated that ~14.8K eggs were taken from those females at the hatchery. Using 2,750 as an average fecundity that's about five females or ~20% of the total females released into the channels that fall. There was also likely some unknown % loss on eggs that were already in the gravel due to de-watering.

Extreme low water level issues during the outmigration from BY2004 led to a biased low outmigration estimate for South/Middle channel that season.

In BY2006, 10 of the combined 34 females were recorded as being "spent" when placed above the monitoring weirs. This was a year without an adult trap so we'd seine/ dip net any and all adults we could get our hands on in either Duncan Creek or the spawning channels below the monitoring weirs. We wanted to get these adults above the monitoring weirs so that it was more likely we'd recover them as carcasses and get otoliths for origin determination.

kalebentley commented 3 years ago

Hey @Hillsont, Thanks for the follow-up. Regarding the three BYs you identified:

Hillsont commented 3 years ago

Thanks @kalebentley, I saw these BYs labeled in your graphs and that's what caught my eye.

As far as BY2004 goes maybe we're looking at FW survival from different points of view. For me, FW survival = egg-to-fry survival which is estimated by dividing fry outmigrants by the number of eggs estimated to have been deposited the prior fall. Based on environmental conditions (water depth) in the Middle/South channel during the BY2004 outmigrant season that persisted almost to the expected normal peak outmigration date, I believe the fry outmigrant estimate is biased low. I’m basing this on observations made during that season, e.g. seining ~6K fry on March 22 from the channel because they wouldn’t use the modified trap entrance and recovering ~300 fry from the stilling well on April 21, long after flow conditions returned to “normal”. The stilling well is located ~10-12 feet away from the channel edge. This led us to believe that fry attempted to leave the channel by swimming through the gravel during the low water portion of the season and an unknown % of fry perished “out in the gravel” instead of being captured and counted in the fry traps. So in my way of thinking about survival at the channels, or channel productivity, that year would be biased low.

image Modified trapping setup at Middle/South channel to deal w/low water.

image Combined daily outmigrants at Duncan 2005. Yellow dots are expected emergence dates for fish released into channels based on 1.6K TU accumulation, EF Lewis gauge height for a flow reference.

When the channels were originally constructed they used steel sheet-pile to create the monitoring weirs. This sheet-pile was driven ~8’ into the ground and extends ~10’ out from the wetted sides of the channels. We didn’t believe that fry would be very successful in getting around that and even if they did it was another 50 feet or so to any kind of surface water. So I don’t think it was a case of fry being able to bypass the trap and it wouldn’t impact SAR estimates via unaccounted for fry production.

image

kalebentley commented 3 years ago

Todd and I just got off the phone to discuss his last post regarding the BY2004 outmigration estimate. Here are a couple of highlights:

  1. The BY2004 outmigration estimate that is currently reported in our juvenile estimates dataset should not be considered biased. The estimate represents the number of fry that successfully outmigrated from the Duncan Channels (via juvenile traps and seining), of which 100% were strontium marked.
  2. For the purposes of our IPM, "eggs" effectively become "fry" once they pass the juvenile traps
  3. When Todd wrote, "So in my way of thinking about survival at the channels, or channel productivity, that year [BY2004] would be biased low," he wasn't suggesting the generated estimate was inaccurate. Rather, he was highlighting that it may not be representative of "normal" conditions preceding juvenile outmigration (i.e., extremely low water in the spring that led to abnormal trapping/outmigration hasn't happened any other year).
  4. Although at a different life stage, this point is similar to one that was raised regarding low water environmental conditions in the fall of 2019 (discussion started here and followed in subsequent posts).
  5. This all gets back to our larger discussion regarding whether or not the observations of juvenile abundance and estimates of FW survival are representative of populations throughout the Columbia chum ESU.

All this being said, I don't think the BY2004 (outmigration 2005) juvenile estimate from Duncan Channel should be omitted. How it should be interpreted is perhaps worth further discussion.

ebuhle commented 3 years ago

After catching my previous mistake, I went back through all years of data and was able to categorize every female that has ever been transported into the Duncan Channels as either green, ripe, or partial (the previous post only included BY 2012-2018 as these data are in our TWS database while pre-2012 are not).

Thanks for digging through the file drawer for this information, @kalebentley!

Here's the companion plot when the data are combined for the two channels (and again discounting partial/ripe females by 70%). Again here, individual data points have shifted a bit but the overall pattern hasn't drastically changed. Though perhaps the estimate of psi would be quite different between the two plots - it's hard for me to tell visually?

It is tough to eyeball, but it does look like discounting ripe / partial females would shift the apparent survival intercept up by maybe 5-10 percentage points, enough to make it much less of an outlier w.r.t. the hyperdistribution. Granted, this is based on a discount rate (70%) that probably represents the lower bound of what's plausible. And biologically, it's not clear to me why "ripe" females should be any less fecund than "green" ones. Still, I'm not quite ready to let this one go, especially because...

barring updated results from @ebuhle after accounting for spawning habitat area

...I haven't posted a comment on these results yet (in part, I was waiting to see where the "partial" females issue would lead us) but the commit message for 576abf41ab08eb6f574205ebfc1a4b1af6d9c9ef gives you a summary, and you can peruse the updated plots. Things do look a lot better after incorporating the habitat offset, but there is still some hyperdistribution vs. pop-level conflict -- and, just as importantly, computational sampling trouble -- associated with Duncan Channel. That quasi-bimodal psi posterior is still there, albeit tamed somewhat, and the divergences are concentrated in the lower mode. To the extent that ripe / partial females could explain this, I think it's worth pursuing. The question, then, is:

So again, what should we do? As @ebuhle pointed out previously here, what value do we use for a "fecundity offset" if we want to discount non-green females? My initial thought was to let the data tell us by accounting for spawning area and seeing if we could fit estimate an offset for the "fixed effect" channel type (partial vs. green).

In principle I agree with letting the data speak, except I would treat proportion non-green as a continuous covariate and estimate a "slope" constrained to [0,1], perhaps with an informative prior that puts most of the mass on [0.75,1]. In practice it's sort of annoying to add a parameter just to deal with this edge case of a subset of years in one population, but at least it looks like it should be reasonably well constrained by the data. The simple starting point would be to just fix the offset as @kalebentley has done. I'll give this a shot once I resolve some fresh Stan hell that cropped up last week.

As far as BY2004 goes maybe we're looking at FW survival from different points of view. For me, FW survival = egg-to-fry survival which is estimated by dividing fry outmigrants by the number of eggs estimated to have been deposited the prior fall. Based on environmental conditions (water depth) in the Middle/South channel during the BY2004 outmigrant season that persisted almost to the expected normal peak outmigration date, I believe the fry outmigrant estimate is biased low.

@Hillsont, my interpretation of "estimate is biased low" is that it refers to observation error: fewer fry were counted than actually outmigrated. But this story of BY2004 (unless fry were indeed able to migrate through the gravel, which I didn't even know was a thing!) seems in fact to be a case of process error, albeit in a highly managed and semi-artificial system. Seems to me you'd want to include this fry estimate in the model and let it inform (i.e., further blow up) the estimate of recruitment process error variance, unless you really believe this was a fluke event that is unlikely to reoccur in a hotter, drier future. If we exclude it, the model will struggle to explain why so few adults (presumably) returned from BY2004, and will partition that residual error naively. The effect of such environmentally induced "outliers" on the unexplained process noise could be mitigated by including covariates, e.g. relevant seasonal gauge height, as in #7.

[EDIT: @kalebentley posted the above while I was composing this one. Yeah, what he said.]

However, @kalebentley and @Hillsont raise another point that does relate to the "bad data" issue (#9), and specifically my question about observation error estimates:

This was a year without an adult trap so we'd seine/ dip net any and all adults we could get our hands on in either Duncan Creek or the spawning channels below the monitoring weirs.

Sounds like there were at least a few years where the adult trap was not operational, but it's not clear to me how, or whether, this is reflected in the escapement estimate and its associated CV. This also relates to an issue @kalebentley and I discussed last week, about how to handle local NOR recruitment from Duncan Channel (vs. the "known" number of spawners translocated into the channels each year). Maybe he can weigh in again, since I'm sure everyone else finds this problem even more confusing than I do.

kalebentley commented 3 years ago

...I haven't posted a comment on these results yet (in part, I was waiting to see where the "partial" females issue would lead us) but the commit message for 576abf4 gives you a summary, and you can peruse the updated plots. Things do look a lot better after incorporating the habitat offset

First off -- YAY!!!!

but there is still some hyperdistribution vs. pop-level conflict -- and, just as importantly, computational sampling trouble -- associated with Duncan Channel. That quasi-bimodal psi posterior is still there, albeit tamed somewhat, and the divergences are concentrated in the lower mode.

...and yet my disdain for Duncan continues to grow...

once I resolve some fresh Stan hell that cropped up last week.

For as much as stan is boosted by @tbuehrens and @mdscheuerell, it sure does seem to lack in the "user friendly" category but I digress...

@Hillsont, my interpretation of "estimate is biased low" is that it refers to observation error: fewer fry were counted than actually outmigrated. But this story of BY2004 (unless fry were indeed able to migrate through the gravel, which I didn't even know was a thing!) seems in fact to be a case of process error, albeit in a highly managed and semi-artificial system. Seems to me you'd want to include this fry estimate in the model and let it inform (i.e., further blow up) the estimate of recruitment process error variance, unless you really believe this was a fluke event that is unlikely to reoccur in a hotter, drier future. If we exclude it, the model will struggle to explain why so few adults (presumably) returned from BY2004, and will partition that residual error naively. The effect of such environmentally induced "outliers" on the unexplained process noise could be mitigated by including covariates, e.g. relevant seasonal gauge height, as in #7.

Yes - this is the point I was trying to make but likely not as coherent.

Sounds like there were at least a few years where the adult trap was not operational, but it's not clear to me how, or whether, this is reflected in the escapement estimate and its associated CV.

My understanding is that although the Duncan Trap, which is located at the mouth of Duncan Creek was not operation in BY 2006 and 2007, the monitoring weirs at the bottom of the Duncan Channels were working. Therefore, adults were still effectively translocated into the channels and thus the counts should be censuses (i.e., no observation error). Here's another map of the Duncan area for context: image

This also relates to an issue @kalebentley and I discussed last week, about how to handle local NOR recruitment from Duncan Channel (vs. the "known" number of spawners translocated into the channels each year). Maybe he can weigh in again, since I'm sure everyone else finds this problem even more confusing than I do.

I am still working to pull together a summary that highlights which juvenile estimates may have underestimated values of uncertainty. Stay tuned.

ebuhle commented 3 years ago

...and yet my disdain for Duncan continues to grow...

Uh-oh. I'm officially neutral here; call me Switzerland.

For as much as stan is boosted by @tbuehrens and @mdscheuerell, it sure does seem to lack in the "user friendly" category but I digress...

Yeah, I mean, Stan is an incredibly robust and full-featured tool that can also be incredibly fragile, mostly because of its dependencies on C++ and the associated toolchain. It sometimes breaks in ways that require a computer scientist and not a natural scientist to fix, and occasionally stays broken for weeks or months at a time. Let's hope this is not one of those times. It still blows any competitor out of the water, though (JAGS who?), and the developer and user community is unparalleled...even if I am still waiting for the Stan-devs to ride to the rescue with my latest issue.

Anyway...

My understanding is that although the Duncan Trap, which is located at the mouth of Duncan Creek was not operation in BY 2006 and 2007, the monitoring weirs at the bottom of the Duncan Channels were working. Therefore, adults were still effectively translocated into the channels and thus the counts should be censuses (i.e., no observation error).

OK, but you're referring to the counts of spawners translocated into the channels, right? I'm talking about the counts of spawners returning to Duncan Creek -- putatively recruits produced from the channels, at least until we get a model of straying set up. I know we went over this at length on the phone, so sorry to belabor it again, but it makes my tiny brain hurt. Remind me, which columns of spawner_data and bio_data should I be looking at for Duncan to get the equivalent of "naturally returning adults" in all the other populations?

Hillsont commented 3 years ago

@ebuhle, my brain definitely hurts. I'm confidant what we've reported for Duncan Channels are only adults translocated above the Monitoring weirs (could be from mainstem sites, Duncan trap, or seined/ dip netted from below weirs), this would be the same for juveniles produced, only from adults translocated above the monitoring weirs. I believe we used Duncan Creek for adults found below the monitoring weirs in years where we didn't have operate an adult trap at the mouth, and we have no estimates of fry produced from adults that spawned below the monitoring weirs.

Hillsont commented 3 years ago

@ebuhle I was also surprised to learn that fry/fish would swim through wetted gravel below the surface and probably not have believed it if I hadn't seen it with my own eyes. This was the best picture I could find to show how far fry migrated through the gravel to end up in the stilling well in 2005 (BY2004 outmigration). The red line is circling the man hole cover to the stilling well. The blue line is the mouth of the south channel in 2003 where it dumped into the middle channel. Keep in mind that in the late winter / early spring of 2005 (outmigration of BY2004) there was no water anywhere near this line. That's a full sized pickup Dodge 2500 pickup truck in the background to give some scale.

image

kalebentley commented 3 years ago

@ebuhle - I know we followed up via phone yesterday afternoon but I wanted to quickly respond here to capture what we discussed.

OK, but you're referring to the counts of spawners translocated into the channels, right? I'm talking about the counts of spawners returning to Duncan Creek -- putatively recruits produced from the channels, at least until we get a model of straying set up.

In BY 2006 and 2007, the adult trap located at the mouth of Duncan Creek was not in operation due to sustained damage. Therefore, unlike other years, returning chum adults could ascend into Duncan Creek proper and spawn. However, adults still could not volitionally access the Duncan Channels because of the Channel monitoring weirs (NOTE: the Duncan Channel adults weirs are not designed like the one at Hamilton Springs). During the fall of BY 2006 and 2007, efforts were made to seine up as many chum adults as possible from Duncan Creek and translocate them into the Duncan Channels. Estimates of adults that spawned in Duncan Creek were never made but we may be able to derive estimates at some point using carcass recoveries. It's unlikely that all chum that returned to Duncan Creek were captured and translocated into Duncan Channel so "S_obs" is likely biased low for these two years. @Hillsont, please correct any information I mischaracterized.

I know we went over this at length on the phone, so sorry to belabor it again, but it makes my tiny brain hurt. Remind me, which columns of spawner_data and bio_data should I be looking at for Duncan to get the equivalent of "naturally returning adults" in all the other populations?

Aside from the two return years I mentioned above (BY '06 & '07), the total number of returning spawners to Duncan Creek (by year) is equal to all rows of data/estimates where "Location.Reach" == "Duncan_Creek". From 2004-2019, a total of 827 chum adults have been captured at the Duncan Creek trap (aka "Location.Reach" == "Duncan_Creek"). Of these, 815 (99%) have been translocated ("Disposition") to Duncan Channels along with adults from other mainstem "Location.Reach(es)" -- 3 have been reportedly taken to Duncan Creek and 9 taken for brood to "Duncan_Hatchery" (aka Washougal Hatchery - this detail is not super important for this point). The same should be true for the Bio Data file. However, I just noticed that there are slight discrepancies between the Abundance and Bio Data files. Specifically, there are a total of 11 fish in the Bio Data file that have a "Location.Reach" & "Disposition" of "Duncan_Creek". Ten of the 11 fish here are from BY '06 and '07 and were likely carcass recoveries (again, outlined above)? Interestingly, there are 14 bio sampled adults with "Location.Reach" of "Duncan_Creek" and "Disposition" of "Duncan_Hatchery" as opposed to the nine in the Abundance file. Obviously, a small deviation but would be good to know what's going on here. @Hillsont or @BradGarnerWDFW any ideas on how to track this one down and update our data files if necessary?

Hillsont commented 3 years ago

OK, here's a summary by year for BYs 2001 through 2007 of what happened at Duncan re: adult traps, surveys, and disposition of adults captured at/in Duncan Creek or channels below monitoring weirs when we had no operational adult trap at the mouth / dam.

2001 – A mess. No adult trap. Surveys were done but not all reaches/areas each time. It does look like they seined Duncan Creek or the areas below the monitoring weirs which resulted in six adults being placed above the weirs. Remainder of adults placed above monitoring weirs came from mainstem locations.

2002- No adult trap in place. A total of 13 chum were observed (12 live and one dead) during spawning ground surveys conducted below the weirs of the channels and in Duncan Creek. No adult trap this season. All adults placed above monitoring weirs came from mainstem locations.

2003- The adult trap at the dam was only operated for four days during the adult season and only captured coho when it was fishing. A total of 16 live chum salmon were observed during spawning ground surveys conducted below the weirs of the channels and in Duncan Creek. All adults placed above monitoring weirs came from mainstem locations.

2004 - Adult trap operated all season, the single adult captured was released above South/Middle weir. Remainder of adults placed above monitoring weirs came from mainstem locations.

2005 - Adult trap operated all season, all adults captured were released above a monitoring weir. Remainder of adults placed above monitoring weirs came from mainstem locations.

2006 – Lost the adult trap early in the season to debris & high flow event. When any adults were observed below the monitoring weirs or in Duncan Creek an attempt was made to capture the adult(s) and release them above a monitoring weir regardless of spawning condition, i.e. spent/spawned out females were released above the North weir. Additionally, nine carcasses were recovered below the monitoring weirs.

2007 – No adult trap. When any adults were observed below the monitoring weirs or in Duncan Creek an attempt was made to capture the adult(s) and release them above a monitoring weir regardless of spawning condition. Additionally, one carcass was recovered below the monitoring weirs.

2008 – Present. Adult trap in operation.

@kalebentley We'll need more details to try and track down the discrepancies you mention above. I don't seem to have access to the CSV files in the data folder, they open as text files in the browser when I try. I did just notice the "raw" option. Is that the correct path? Copy the "raw" into an excel sheet and use text to columns / parse based on commas to get a workable excel file?

@ebuhle , as @kalebentley mentioned above the adult weirs at Duncan and Hamilton are different beasts. The Duncan monitoring weirs are steel sheet-pile with notches cut in them that have channels welded in them to hold the adult weir/grate in place. Those steel bar grates weigh ~45lbs so they don't float up or get compromised by debris. When an adult grate (outlined in red) is in place adults cannot easily leave unless Bonneville tailwater backwaters the channels which does rarely happen during the adult season. When we had reason to believe translocated adults did "leave" the channels we've made note of it.

image

ebuhle commented 3 years ago

Thanks for summarizing this, @kalebentley and @Hillsont.

2001 – A mess. No adult trap. Surveys were done but not all reaches/areas each time. It does look like they seined Duncan Creek or the areas below the monitoring weirs which resulted in six adults being placed above the weirs. Remainder of adults placed above monitoring weirs came from mainstem locations.

2002- No adult trap in place. A total of 13 chum were observed (12 live and one dead) during spawning ground surveys conducted below the weirs of the channels and in Duncan Creek. No adult trap this season. All adults placed above monitoring weirs came from mainstem locations.

2003- The adult trap at the dam was only operated for four days during the adult season and only captured coho when it was fishing. A total of 16 live chum salmon were observed during spawning ground surveys conducted below the weirs of the channels and in Duncan Creek. All adults placed above monitoring weirs came from mainstem locations.

2006 – Lost the adult trap early in the season to debris & high flow event. When any adults were observed below the monitoring weirs or in Duncan Creek an attempt was made to capture the adult(s) and release them above a monitoring weir regardless of spawning condition, i.e. spent/spawned out females were released above the North weir. Additionally, nine carcasses were recovered below the monitoring weirs.

2007 – No adult trap. When any adults were observed below the monitoring weirs or in Duncan Creek an attempt was made to capture the adult(s) and release them above a monitoring weir regardless of spawning condition. Additionally, one carcass was recovered below the monitoring weirs.

So it sounds like S_obs (that is, putatively local adult recruitment, not to be confused with translocated adults in the spawning channels) in these years should be NA? Or possibly those few spawners recovered in Duncan Creek could be treated as "minimum counts" as we've discussed elsewhere (#9)?

It's also clear to me now that we will need something like the S_add_obs construct that we've discussed before to explicitly account for nonlocal but potentially NOR adults translocated into Duncan Channel (and no other population, as I understand it). That's probably next on my list of low-hanging fruit after I explore the non-green female offset. I'm still trying to resolve the Stan issue (or more likely, toolchain issue) that's preventing me from compiling salmonIPM. :angry:

@kalebentley We'll need more details to try and track down the discrepancies you mention above. I don't seem to have access to the CSV files in the data folder, they open as text files in the browser when I try. I did just notice the "raw" option. Is that the correct path? Copy the "raw" into an excel sheet and use text to columns / parse based on commas to get a workable excel file?

@Hillsont, the safest and most straightforward thing would be to clone this repo into an RStudio project and interact with it using the Git functionality there. Among other things, that would give you a regular folder on your computer with all the data files, which you could open with Excel. If that's not possible for some reason, you could save a read-only copy by going here, right-clicking, and doing "Save Link As".

Hillsont commented 3 years ago

@ebuhle , thanks for the data file pointers. I'll have to work with Kale on getting a clone set up / working in Rstudio. When I tried the save as option I get an html file that wants to open in a browser, and when I switch the file type to csv, I get a csv file filled with html code, I can't win.

ebuhle commented 3 years ago

Huh, my mistake. Clearly I've never done this myself. It turns out you can right-click-save-as on the "Raw" button and you will get a .csv file. [Disclaimer about how this defeats the point of multi-contributor version control, yadda yadda.]

ebuhle commented 3 years ago

OK, I finally solved my compilation problem and tried fitting the Ricker model with a fecundity discount parameter for non-green females. This looks very analogous to @kalebentley's approach shown above:

E_hat = f0 * (p_G_obs + delta_NG * p_NG_obs) * S

where f0 is age- and sex-ratio-weighted fecundity as before; p_G_obs and p_NG_obs are the proportions of green and non-green females, respectively, computed directly from the data in Data_Duncan_Females_by_Condition_2021-04-19.csv; and delta_NG is a discount rate in [0,1]. I assumed p_G_obs == 1 everywhere except Duncan Channel. First I fixed delta_NG at 0.75, similar to what @kalebentley did, and now I'm estimating it freely with an implicit uniform(0,1) prior.

Things are looking "better", at least in the sense that psi[Duncan Channel] moves closer to the hyper-mean. However, it is still a bit "lumpy" (visible in panel C below and more clearly here), and there are still about as many divergences as before, still concentrated in the lower "mode". There are no other obvious concentrations of divergences in parameter space that I can discern, so my working hypothesis remains that psi[Duncan Channel] is the culprit due to its strongly varying posterior curvature. Maybe we just have to deal with it by cranking up adapt_delta to drive the stepsize down and accept a longer runtime as a result.

Also, the posterior of delta_NG is itself somewhat lumpy, and includes lower values than we might perhaps consider biologically plausible (?). We could of course constrain it with an informative prior as I mentioned earlier -- basically a compromise between the fixed value and the current approach -- which would let psi[Duncan Channel] shift back to the left.

Thoughts?

image

ebuhle commented 3 years ago

It's also clear to me now that we will need something like the S_add_obs construct that we've discussed before to explicitly account for nonlocal but potentially NOR adults translocated into Duncan Channel (and no other population, as I understand it).

I went ahead and did this. S_add_obs is derived from spawner_data by taking all adults with a given final disposition that returned to a different location, with a carve-out for Duncan Creek / Duncan Channel:

https://github.com/mdscheuerell/chumIPM/blob/062562763ed206cbdc3039fede2f9841865488fc/analysis/R/01_LCRchumIPM_data.R#L75-L83

This more direct approach allows us to revise the classification of "hatchery" spawners based on bio_data, because translocated natural adults no longer need to be designated H to avoid counting them as local natural recruitment. Now H only includes genuine hatchery-origin fish plus any others (i.e., Duncan Channel) whose known origin doesn't match their disposition. This last special case could still be considered a kludge, but we'll be able to disambiguate Duncan Channel from true hatcheries by explicitly modeling straying.

As expected, the main results are not noticeably different (e.g., compare the life-cycle multiplot above vs. here), except of course now p_HOS in Duncan Channel actually makes sense. Note that when p_HOS is estimated in a terminal year with no bio_data and no spawner_data, e.g. 2020, its posterior is simply the uniform(0,1) prior; but of course this has no effect on the retrospective fit.

An unanticipated happy side effect is that the model now runs considerably faster per sample (and per effective sample, although I haven't done a rigorous comparison). Evidently the direct offset for translocated spawners, passed in as data, makes the posterior geometry easier to traverse for whatever reason. (I wouldn't have thought estimating high values for those ~20 p_HOS parameters in Duncan Channel would slow things down noticeably, but then again they are correlated a posteriori with the corresponding wild spawner states.) I used the time savings to do a run with adapt_delta = 0.99 and the same number of warmup and saved draws as before. It takes ~4.5 hr on my machine and produces ~5 divergences and adequate effective sample sizes for computing 95% credible intervals. I can live with that.

I think the next (and maybe last?!?) outstanding Duncan-related issue is finalizing what to do about the smolt and spawner observation error estimates in general, and especially in those years when the adult trap was non-operational.

kalebentley commented 3 years ago

OK, I finally solved my compilation problem and tried fitting the Ricker model with a fecundity discount parameter for non-green females. This looks very analogous to @kalebentley's approach shown above:

E_hat = f0 * (p_G_obs + delta_NG * p_NG_obs) * S

where f0 is age- and sex-ratio-weighted fecundity as before; p_G_obs and p_NG_obs are the proportions of green and non-green females, respectively, computed directly from the data in Data_Duncan_Females_by_Condition_2021-04-19.csv; and delta_NG is a discount rate in [0,1]. I assumed p_G_obs == 1 everywhere except Duncan Channel. First I fixed delta_NG at 0.75, similar to what @kalebentley did, and now I'm estimating it freely with an implicit uniform(0,1) prior.

Very cool. Curious, is the offset (delta_NG) estimated from only Duncan data or all populations?

Things are looking "better", at least in the sense that psi[Duncan Channel] moves closer to the hyper-mean. However, it is still a bit "lumpy" (visible in panel C below and more clearly here), and there are still about as many divergences as before, still concentrated in the lower "mode". There are no other obvious concentrations of divergences in parameter space that I can discern, so my working hypothesis remains that psi[Duncan Channel] is the culprit due to its strongly varying posterior curvature. Maybe we just have to deal with it by cranking up adapt_delta to drive the stepsize down and accept a longer runtime as a result.

The posterior of psi for Duncan doesn't look "lumpy" to me but rather more right-skewed.

Also, the posterior of delta_NG is itself somewhat lumpy, and includes lower values than we might perhaps consider biologically plausible (?). We could of course constrain it with an informative prior as I mentioned earlier -- basically a compromise between the fixed value and the current approach -- which would let psi[Duncan Channel] shift back to the left.

My inclination is to leave "delta_NG" the way it is now (i.e., estimated using a non-informative prior - though I can't say if there's an alternative prior that makes more sense). I see that the posterior of "delta_NG" is lumpy, quite wide, and has a median somewhere around 0.4 - 0.6. I get that this is likely less than we might expect based on the protocol Todd has described where females should only be transported if they are at least 75% fecund (using some sort of visual assessment).

However, it just occurred to me that perhaps something else is going on with Duncan spawners. Aside from hatchery brood, Duncan is the only population where 100% of the spawners are not only handled but transported. I may be grasping at straws but maybe there's some handling stress that is carrying over to either spawning success or survival of embryos? Perhaps there's even a difference in stress depending on whether the female was green or ripe?

Although not exactly related, there was a presentation at the Steelhead Manager's meeting last month by some folks at ODFW evaluating differences in catch rates of hatchery steelhead offspring whose parents that were collected either by angling or at the hatchery (volitional returns). The study found that offspring produced by angler caught brood consistently produced fewer adult returns and were under-represented in the creel. Although the mechanisms for these patterns were not discerned (and I'm sure @tbuehrens will point out countless flaws in the study), the results were opposite of what ODFW had hypothesized and mentioned that perhaps the added stress of angling and transport could be contributing to the observed patterns. Dunno.

Getting back to Duncan, @Hillsont recently told me about a spreadsheet he has maintained that summarizes the total number of retained eggs from recovered carcasses by spawning Channel (North, South) and year. My understanding is that every female carcass that is recovered is cut open and any retained eggs are individually counted.

I quickly compared these retained eggs data (both absolute and percentage assuming every female had a fecundity of 2,750 eggs) with the percentage of female spawners that were classified as green (% Green Females) to see if there was any relationship (see plots below; data include BY02 -13, 15-16). It would best to compare retained eggs of individual females based on their condition call and directly evaluate for this in some sort of GLM but I don't have access to these data and was quickly just trying to explore what Todd shared.

image

image

There's not much of a pattern here and certainly worth pointing out that other factors can lead to retained eggs and certainly could be happening with females in other populations.

Overall, again, I am inclined to leave the model the way it is now and let the data speak for themselves unless others have different thoughts.

kalebentley commented 3 years ago

It's also clear to me now that we will need something like the S_add_obs construct that we've discussed before to explicitly account for nonlocal but potentially NOR adults translocated into Duncan Channel (and no other population, as I understand it).

I went ahead and did this. S_add_obs is derived from spawner_data by taking all adults with a given final disposition that returned to a different location, with a carve-out for Duncan Creek / Duncan Channel:

https://github.com/mdscheuerell/chumIPM/blob/062562763ed206cbdc3039fede2f9841865488fc/analysis/R/01_LCRchumIPM_data.R#L75-L83

This more direct approach allows us to revise the classification of "hatchery" spawners based on bio_data, because translocated natural adults no longer need to be designated H to avoid counting them as local natural recruitment. Now H only includes genuine hatchery-origin fish plus any others (i.e., Duncan Channel) whose known origin doesn't match their disposition. This last special case could still be considered a kludge, but we'll be able to disambiguate Duncan Channel from true hatcheries by explicitly modeling straying.

I think I am mostly tracking here though quick clarification - adults that are deemed "Duncan_Channel" origin fish that are recovered at any location aside from "Duncan_Creek" are still considered hatchery-origin fish and thus would contribute to a higher pHOS estimate in that year? I believe "Duncan_Channel" fish are designated differently depending on where the estimates are reported. Regardless, we'll want to be able to estimate and track the total number of returning adults by origin ("Natural_spawner", "Duncan_Channel", and each hatchery) across all "Location.Reach"(es).

As expected, the main results are not noticeably different (e.g., compare the life-cycle multiplot above vs. here), except of course now p_HOS in Duncan Channel actually makes sense. Note that when p_HOS is estimated in a terminal year with no bio_data and no spawner_data, e.g. 2020, its posterior is simply the uniform(0,1) prior; but of course this has no effect on the retrospective fit.

Unrelated to Duncan, I noticed that the point estimate of pHOS for Grays_MS in 2019 was substantially higher (~33%) than any other population and year. I quickly looked at the bio-data file and realized there was a copy and paste mistake. My bad. I've fixed the data error and just submitted a "push". Certainly not trying to deflect blame but this provides a perfect example as to why we (me, @Hillsont, @BradGarnerWDFW) need to allocate time to assess and (likely) update how data are stored, queried, and summarized.

An unanticipated happy side effect is that the model now runs considerably faster per sample (and per effective sample, although I haven't done a rigorous comparison). Evidently the direct offset for translocated spawners, passed in as data, makes the posterior geometry easier to traverse for whatever reason. (I wouldn't have thought estimating high values for those ~20 p_HOS parameters in Duncan Channel would slow things down noticeably, but then again they are correlated a posteriori with the corresponding wild spawner states.) I used the time savings to do a run with adapt_delta = 0.99 and the same number of warmup and saved draws as before. It takes ~4.5 hr on my machine and produces ~5 divergences and adequate effective sample sizes for computing 95% credible intervals. I can live with that.

Great news!

I think the next (and maybe last?!?) outstanding Duncan-related issue is finalizing what to do about the smolt and spawner observation error estimates in general, and especially in those years when the adult trap was non-operational.

I should have an update on this topic soon that will hopefully help facilitate what to do here.

ebuhle commented 3 years ago

Very cool. Curious, is the offset (delta_NG) estimated from only Duncan data or all populations?

Well, p_G_obs only ever departs from 1 in Duncan Channel, so that's the only population that informs the estimate of delta_NG.

The posterior of psi for Duncan doesn't look "lumpy" to me but rather more right-skewed.

True, the borderline bimodality has all but disappeared, although that depends on the histogram bin width / kernel bandwidth. Ditto for Mmax. Nevertheless, divergences still seem to be associated with the change in posterior curvature from the "lower" mass of psi[Duncan Channel] (driven by the local data) to the right tail (driven by the hyperdistribution). It's feasible to squelch the divergences with smaller HMC integration steps and longer runtime, so this isn't fatal, just annoying.

My inclination is to leave "delta_NG" the way it is now (i.e., estimated using a non-informative prior - though I can't say if there's an alternative prior that makes more sense).

That's my inclination too.

I think I am mostly tracking here though quick clarification - adults that are deemed "Duncan_Channel" origin fish that are recovered at any location aside from "Duncan_Creek" are still considered hatchery-origin fish and thus would contribute to a higher pHOS estimate in that year?

That's correct. This is currently the only mechanism to avoid counting known nonlocal-origin spawners that "volunteered" rather than being deliberately translocated (and thus are not enumerated but only estimated from origin-composition bio_data) as local natural recruitment.

I believe "Duncan_Channel" fish are designated differently depending on where the estimates are reported.

How so? This sounds potentially problematic, not just for the current approach but even more so for a "hatchery / straying" model, but it's the first I'm hearing of it.

Regardless, we'll want to be able to estimate and track the total number of returning adults by origin ("Natural_spawner", "Duncan_Channel", and each hatchery) across all "Location.Reach"(es).

Roger that. This will come into play when we include hatcheries in the set of populations and explicitly model straying -- which, like I've been saying, go hand in hand and will require the pairwise distance matrix.

Unrelated to Duncan, I noticed that the point estimate of pHOS for Grays_MS in 2019 was substantially higher (~33%) than any other population and year. I quickly looked at the bio-data file and realized there was a copy and paste mistake. My bad. I've fixed the data error and just submitted a "push".

Oh, interesting. I wonder if this might partially explain some weirdness with several of the "estimates" of broodstock take in Grays MS, including in 2019, that I've been trying to figure out. (Long story, feel free to ignore the details, but the bottom line is that you can't simply subtract B_take_obs from the [true state] adult recruits. Instead you have to estimate the broodstock removal rate B_rate as a free parameter by giving B_take_obs a fake "observation likelihood" with an arbitrarily small CV, which acts as a penalty. However, "arbitrarily small" causes sampling problems. I thought I'd found a good compromise long ago, but in this case some of the "true B_take" estimates differ from B_take_obs by as much as 20-30%. I'm working on a solution.)

[EDIT: Mostly ignore the last paragraph. The estimates aren't off by 20-30%, I was just confused. However, I still think this approach can be improved.]

I should have an update on this topic soon that will hopefully help facilitate what to do here.

:+1:

Hillsont commented 3 years ago

I believe "Duncan_Channel" fish are designated differently depending on where the estimates are reported.

How so? This sounds potentially problematic, not just for the current approach but even more so for a "hatchery / straying" model, but it's the first I'm hearing of it.

@ebuhle I think i can take this one. Duncan spawning channel origin adult returns are considered NORs by NOAA and don't count against pHOS in the channels or at other spawning areas if they stray. However, the evaluation of chum reintroduction into Duncan Creek (comparing adult returns from fry produced in the channels vs hatchery origin fry releases vs natural straying recolonizing) needs to identify and track channel origin adult returns from other NOR adults. As Kale has said before, Duncan is a PITA.

Hillsont commented 3 years ago

My run-reconstruction and project summary spreadsheets tracked three adult groups: Duncan HORs, Duncan channel NORs, and non-Duncan NORs. How the Duncan channel NORs and non-Duncan channel NORs were treated depended on audience, i.e. ESA or re-introduction project evaluation.

ebuhle commented 3 years ago

Duncan spawning channel origin adult returns are considered NORs by NOAA and don't count against pHOS in the channels or at other spawning areas if they stray. However, the evaluation of chum reintroduction into Duncan Creek (comparing adult returns from fry produced in the channels vs hatchery origin fry releases vs natural straying recolonizing) needs to identify and track channel origin adult returns from other NOR adults. As Kale has said before, Duncan is a PITA.

Thanks @Hillsont. This all makes sense; clearly strays from Duncan Channel vs. the three hatcheries represent different origins with potentially different management implications. Again, we're only lumping them for the time being as a mechanism to avoid counting them as natural local recruitment; and we've just made a major step toward greater realism in this regard by distinguishing deliberately translocated natural spawners from strays.

I thought @kalebentley was saying that fish with origin == "Duncan Channel" were somehow designated differently in the data depending on their location and/or disposition (like maybe they're classified as "Natural spawner" in some cases, or something). But it sounds like I misunderstood what he meant by "designated differently depending on where the estimates are reported".

kalebentley commented 3 years ago

I thought @kalebentley was saying that fish with origin == "Duncan Channel" were somehow designated differently in the data depending on their location and/or disposition (like maybe they're classified as "Natural spawner" in some cases, or something). But it sounds like I misunderstood what he meant by "designated differently depending on where the estimates are reported".

My comment was an ambiguous way of stating what @Hillsont explained above. Ultimately, I think as long as "Duncan_Channel" origin adults and their progeny can be tracked across "Location.Reach"(es) we should be good for whatever report/metric we need to generate.