afsc-gap-products / gap_products

This repository supports code used to create tables in the GAP_PRODUCTS Oracle schema. These tables include the master production tables, tables shared with AKFIN, and tables publicly shared on FOSS.
https://afsc-gap-products.github.io/gap_products/
Creative Commons Zero v1.0 Universal
6 stars 6 forks source link

gapindex is not appropriately integrating the pre-start date composite species code from the SPECIES_YEAR table in biomass estimates #38

Closed EmilyMarkowitz-NOAA closed 1 month ago

EmilyMarkowitz-NOAA commented 6 months ago

Issue Description

The gapidnex R package is not calculating estimates for pre-confidence (or SPECIES_YEAR.START_DATE) species codes as listed in the SPECIES_YEAR table (or future version of this table). This issue applies to every species listed in the SPECIES_YEAR table, but I'll use arrowtooth and Kamchatka flounder as an example below. This echos but is slightly different to https://github.com/afsc-gap-products/gap_products/issues/37.

According to the SPECIES_YEAR table, the 10111 Atheresthes sp. (arrowtooth [10110] and Kamchatka [10112] flounder) composite species code should be used for all arrowtooth and Kamchatka flounder caught before 1991. However, there are only 0s, except for the one time that the 10111 code was earnestly used in 2023 (AKFIN_BIOMASS table: AREA_ID = 99901 AND SPECIES_CODE = 10111). All rows before 1991 should be populated with non-zero estimates because these species were caught (and likely listed as arrowtooth flounder [10110]), but scientists were not confident about their IDs.

!! It's important that both pre- and post-time stanza estimates are calculated so we can accurately calculate total annual biomass for each `AREA_ID. !!

image

For the record, you'll see in the next two screenshots that arrowtooth (10110) and Kamchatka (10112) flounder estimates are correctly cropped by gapindex to the post-1992 time-series.

Arrowtooth flounder: (AKFIN_BIOMASS table: AREA_ID = 99901 AND SPECIES_CODE = 10110) image

Kamchatka flounder: (AKFIN_BIOMASS table: AREA_ID = 99901 AND SPECIES_CODE = 10112) image

Proposed Solution

Add calculating the pre-SPECIES_YEAR.START_DATE species composite codes in gapindex.

zoyafuso-NOAA commented 6 months ago

I just want to flesh this out a little bit more after our office hour session.

zoyafuso-NOAA commented 6 months ago

Sorry, one more clarification for discussion: this only for the CPUE and BIOMASS tables, right? We’re not worried about size and age compositions, right?

EmilyMarkowitz-NOAA commented 6 months ago

Oh, good point. My vote is that any complexing decisions we make on this should be consistent across all data products (aka CPUE, biomass/abundance, age comp, and length comp tables). Am I right in thinking that in the case of the age and length comp tables, we would simply not produce estimates for species before the SPECIES_YEAR.START_DATE listed, because we do not provide complexed age/length comps. Is that what we are currently doing/what you are alluding to in your comment?

zoyafuso-NOAA commented 5 months ago

Yes, we currently are not providing size and age compositions for aggregated species_code complexes in GAP_PRODUCTS nor was this the case in the historical workflows, and it sounds like it will be okay for this to continue to be the case going forward.

zoyafuso-NOAA commented 5 months ago

This issue is slowly dissolving me inside because it should be so simple. Another complication is that within the Bathyraja 405 species complex, SPECIES_CODE values 435, 455, 471, and 472 have a start year of 1999 while species_code 480 has a start date of 1988. That creates three temporal stanzas:

I'm sorry about throwing wrenches here I just want to make sure that this additional work makes sense and is feasible before I go ahead and work on creating the workflow to accommodate this in the production run.

Ned-Laman-NOAA commented 5 months ago

I recommend that we invite the taxonomic brains to advise us here. Maybe there's a way out of this that's less complicated than having several temporal stanzas where the same code means different things in each.

EmilyMarkowitz-NOAA commented 5 months ago

Agreed - thanks for thinking through this issue from all of the different angles, @zoyafuso-NOAA ! It seems way more complicated than we originally anticipated. @SarahFriedman-NOAA and @ThaddaeusBuser-NOAA , do either of you have any insight on a best path forward for this issue?

SarahFriedman-NOAA commented 5 months ago

Thanks for the careful discussion on this topic. Unfortunately, I am struggling to think of another way to handle this issue. I think the proposed solution is probably the best way forward. Though I will say that I don't think the same code means different things in each temporal stanza. Essentially, everything just being grouped at the genus-level (at least for skates) and that seems quite consistent for me. I understand that that category consists of a few different species codes, but I fail to see how that is substantially different than even the modern day usage of 405.