ebuhle / LCRchumIPM

This is the development site for an Integrated Population Model for chum salmon in the lower Columbia River.
MIT License
4 stars 1 forks source link

updated datasets #26

Open kalebentley opened 1 month ago

kalebentley commented 1 month ago

FYI - I updated four dataset files (adults, biodata, juveniles, and Duncan females) and pushed the updates to the main branch. A couple of quick things to note that I mentioned in the commit logs but wanted to re-iterate here:

ebuhle commented 1 month ago

Thanks @kalebentley!

A few questions / comments:

  • Data_Abundance_Spawners_Chum: file updated with finalized estimates from return year 2022 and preliminary estimates from 2023; the 2023 data/estimates are missing for the number of adults collected for hatchery broodstock and translocate into Duncan channel; data set will be updated once data are available.

So this means the adults with a hatchery or Duncan Channel as their disposition (and in the latter case, not Duncan Creek as their location) simply don't appear yet? Not that they appear, but are attributed to their location?

  • Data_BioData_Spawners_Chum: file updated with summarized bio-data from return year 2022; a handful of datasets have their count field listed as "tbd"

The value "tbd" hasn't appeared in this dataset before, so it will require temporary kludges in the code that processes the various compositional data types. NA would be the conventional way to encode this, and that way the data-wrangling script should continue to just work.

  • Data_Duncan_Females_by_Condition: updated with summarized data from return year 2022 and 2023; the summary query indicated that zero chum adults were translocated into Duncan North Channel in 2022, but not certain this is correct. Enter the counts as NA for now but will look into this and update the file, if needed, later.

NA is not allowed in this data set -- remember p_G_obs is treated as known data, so we need a finite non-missing value in every year. Probably best to just enter a placeholder of 0 for now (which will work since there are nonzero values for South) and make a note in the Comment column.

kalebentley commented 1 month ago

Hey @ebuhle,

Appreciate the feedback. Getting your input was what I was half hoping for in creating this post (the other half serving a log of data updates I need to circle back on later). Quick replies below.

Thanks @kalebentley!

A few questions / comments:

  • Data_Abundance_Spawners_Chum: file updated with finalized estimates from return year 2022 and preliminary estimates from 2023; the 2023 data/estimates are missing for the number of adults collected for hatchery broodstock and translocate into Duncan channel; data set will be updated once data are available.

So this means the adults with a hatchery or Duncan Channel as their disposition (and in the latter case, not Duncan Creek as their location) simply don't appear yet? Not that they appear, but are attributed to their location?

Let me see if I can explain this better. In the spawner (i.e., adult) data set, there are 15 rows for the return year of 2023 where I haven't entered "Abund.Mean" and "Abund.SD" for a corresponding. Every single one, except for Hardy Creek, corresponds to the number of adults that returned to a given "Location.Reach" (e.g., St. Cloud) but were collected/transported to either a hatchery or Duncan Channel (i.e., "Disposition" not equal to "Location.Reach". After double checking with @BradGarnerWDFW as to how the estimates of abundance are generated, the (preliminary) estimates that I entered yesterday for 2023 include the adults that were subsequently collected/transported to another location. For example, the Ives (mean) estimate of abundance for 2023 is entered as 2,798.55. Based on previous years, I'm fairly certain some adults were collected at Ives and transported to Duncan channel and/or Duncan Hatchery. Therefore, this number will have to be updated later as well (i.e., mark-recapture estimate minus the number transported elsewhere). All said, if this incomplete/(slighly) inaccurate dataset is going to be problematic then I suppose I could delete the 2023 spawner data/estimates I added yesterday until all data are available. Thoughts?

Unrelated - as I was looking over spawner data, I noticed a few discrepancies between the estimates in our IPM dataset and the "master" dataset that @BradGarnerWDFW (and formally @Hillsont) maintained. Therefore, I went through every single number in "Data_Abundance_Spawners_Chum" file, compared it with the number listed in Brad's master dataset, and updated the IPM numbers, if necessary. There were only a handful that didn't match and only a few that were off by relatively large amount. At some point, we should do the same exercise for all IPM data once we move to a more automated method of summarizing data, generating estimates, and transcribing these numbers to various portals and subsequent data sets.

  • Data_BioData_Spawners_Chum: file updated with summarized bio-data from return year 2022; a handful of datasets have their count field listed as "tbd"

The value "tbd" hasn't appeared in this dataset before, so it will require temporary kludges in the code that processes the various compositional data types. NA would be the conventional way to encode this, and that way the data-wrangling script should continue to just work.

Ok - I've changed the "tbd" values to NAs and will update once @BradGarnerWDFW gets me this information. For a reminder to myself, these values are for return year 2022 for Ives, St. Cloud, Grays_CJ, Grays_WF, and Grays_MS.

  • Data_Duncan_Females_by_Condition: updated with summarized data from return year 2022 and 2023; the summary query indicated that zero chum adults were translocated into Duncan North Channel in 2022, but not certain this is correct. Enter the counts as NA for now but will look into this and update the file, if needed, later.

NA is not allowed in this data set -- remember p_G_obs is treated as known data, so we need a finite non-missing value in every year. Probably best to just enter a placeholder of 0 for now (which will work since there are nonzero values for South) and make a note in the Comment column.

Ok - I've updated the NAs values to zero and will circle back with @BradGarnerWDFW later to confirm how many females (by condition) were translocated into Duncan North Channel in 2022.

ebuhle commented 1 month ago

After double checking with @BradGarnerWDFW as to how the estimates of abundance are generated, the (preliminary) estimates that I entered yesterday for 2023 include the adults that were subsequently collected/transported to another location.

OK, got it, so it is actually the latter of the two scenarios I mentioned. Well, this will obviously result in an overestimate of recruitment from the 2023 brood year in those source populations, which will affect short-term future projections. But it shouldn't matter for the retrospective fit, since the recruits from previous brood years are accounted for. And presumably by the time we need to do any critical forward simulation, these data will be available (and we will likely have the broodstock-to-smolt transition that explicitly tracks such transfers).

kalebentley commented 1 month ago

After double checking with @BradGarnerWDFW as to how the estimates of abundance are generated, the (preliminary) estimates that I entered yesterday for 2023 include the adults that were subsequently collected/transported to another location.

OK, got it, so it is actually the latter of the two scenarios I mentioned. Well, this will obviously result in an overestimate of recruitment from the 2023 brood year in those source populations, which will affect short-term future projections. But it shouldn't matter for the retrospective fit, since the recruits from previous brood years are accounted for. And presumably by the time we need to do any critical forward simulation, these data will be available (and we will likely have the broodstock-to-smolt transition that explicitly tracks such transfers).

Exactly re: the adults that were ultimately translocated already "appear" in the dataset but are currently all "attributed" to the Location.Reach where they returned. Therefore, this does result in an overestimate of adults that spawned at these locations but i.) should be pretty minor given the relatively small number of adults that are translocated, and ii) will be updated/fixed eventually, as you highlighted. I don't have a specific timeline but will update this thread once it happens.

ebuhle commented 4 days ago

Hey @kalebentley, quick question about one entry in the updated data that caught my eye while looking at the sex-ratio plot from the fitted model: in 2022, Ives is recorded as having 34 female spawners and no males. Is this right?

fish_data %>% filter(pop=='Ives') %>% select(pop,year,n_M_obs,n_F_obs)

    pop year n_M_obs n_F_obs
1  Ives 2002     164     160
2  Ives 2003      35      57
3  Ives 2004      46      46
4  Ives 2005      42      88
5  Ives 2006      55      62
6  Ives 2007      25       9
7  Ives 2008      56      41
8  Ives 2009      30      23
9  Ives 2010      44      45
10 Ives 2011      46      59
11 Ives 2012      28      28
12 Ives 2013      15      11
13 Ives 2014      50      57
14 Ives 2015      22      11
15 Ives 2016      24      27
16 Ives 2017      60      67
17 Ives 2018      29      38
18 Ives 2019      46      41
19 Ives 2020      55      70
20 Ives 2021      52      53
21 Ives 2022       0      34
22 Ives 2023       0       0
kalebentley commented 3 days ago

Hey Eric, Good catch. This is a mistake. I somehow missed updating the 2022 Ives males bio-data. I filled these in. I also noticed that the bio-data was missing for 2022 St. Cloud so I filled that in too. Kale

ebuhle commented 3 days ago

Great, thanks. I don't have time to re-fit the models (retrospective ~6 h, prospective scenarios ~15 h each) before 6/30, but of course we will circle back and incorporate these data, and the other data noted in this Issue, in the coming months. In the meantime, this error has a trivial effect on estimates and inferences; it just looks weird in the plot.