afsc-gap-products / gap_products

This repository supports code used to create tables in the GAP_PRODUCTS Oracle schema. These tables include the master production tables, tables shared with AKFIN, and tables publicly shared on FOSS.
https://afsc-gap-products.github.io/gap_products/
Creative Commons Zero v1.0 Universal
5 stars 5 forks source link

GOA Skate Species disappear from certain years #37

Open Lee-Cronin-Fine-NOAA opened 3 months ago

Lee-Cronin-Fine-NOAA commented 3 months ago

Issue

I run the GOA skate assessment. This assessment divides skates into three groups. My biomass values from last year's assessment match up with the ones produced from GAP. However the third group "other skates", the biomass values for 1990, 1993 and 1996 do not match up. The GAP values are significantly lower. After doing some investigating, I discovered that species_code 435, 471 and 472 were not present in GAP those years and represent the bulk of the "other skate" biomass in those years. The three species-code start appearing in 1999 and their biomass values from 1999 forward match up from to last years assessment values. So my issue/question is what happened to those species code in 1990, 1993 and 1996.

Attached is an image of how I quaried that data from GAP call GAP

zoyafuso-NOAA commented 3 months ago

Hi Lee,

The taxonomic confidence levels for those three species pre-1999 are fairly low, so we have decided to only include those records from 1999-on as the standard product in GAP_PRODUCTS. Pinging @Ned-Laman-NOAA just in case he has any insight about including these skates pre-1999 in the assessment.

To get those skate records pre-1999, we would need to do a custom pull from gapindex, which is the package used to create the GAP_PRODUCTS tables. So something like:

## devtools::install_github("afsc-gap-products/gapindex")
library(gapindex) #v2.2.0

## Connect to Oracle using your AFSC database credentials
sql_channel <- gapindex::get_connected()

## Pull other skate data, 1990-2023
gapindex_data <- gapindex::get_data(
  year_set = c(1990:2023),
  survey_set = "GOA",
  spp_codes = c(400, 420, 435, 440, 455, 460, 471, 472, 480, 485),   
  haul_type = 3,
  abundance_haul = "Y",
  pull_lengths = T,
  sql_channel = sql_channel)

## Fill in zeros and calculate CPUE
cpue <- gapindex::calc_cpue(racebase_tables = gapindex_data)

## Calculate stratum-level biomass, population abundance, mean CPUE and 
## associated variances
biomass_stratum <- gapindex::calc_biomass_stratum(
  racebase_tables = gapindex_data,
  cpue = cpue)

## Calculate aggregated biomass and population abundance across subareas,
## management areas, and regions
biomass_subareas <- gapindex::calc_biomass_subarea(
  racebase_tables = gapindex_data,
  biomass_strata = biomass_stratum)

subset(x = biomass_subareas,
       subset = AREA_ID == c(803, 804, 805),
       select = c(SURVEY_DEFINITION_ID, AREA_ID, YEAR, SPECIES_CODE,
                  BIOMASS_MT, BIOMASS_VAR))
Lee-Cronin-Fine-NOAA commented 3 months ago

Thanks for the clarifying response.

I have been thinking about this and was wondering why weren't the 435, 471 and 472 skates pre-1999 moved to the species code 400 (skate unid)? For the assessment, the 435, 471 and 472 skates only impact the "other skate" category and I don't need to know the species level for skates in that category.

Ned-Laman-NOAA commented 3 months ago

I'd support Lee's notion of rolling up the pre-99 uncertainly ID'd skates into Rajidae/skate unid. (400). We confidently know they were skates and we have the weights and counts so we don't want to just "leave them out." There is a real challenge here with how we manage the on/off grouping of taxa and is part of an ongoing effort to develop and modernize the taxonomic side of our database and data management systems so I don't want to give the impression that this is trivial. It will take some time to implement this.

zoyafuso-NOAA commented 3 months ago

Hi @Lee-Cronin-Fine-NOAA

Hopefully these responses and/or code will help you with this issue at least in the short term.

@Ned-Laman-NOAA : would that involve changing species_code values within RACE_DATA/RACEBASE pre-1999 or via the GROUP_CODE functionality in gapindex to create aggregations?

Ned-Laman-NOAA commented 3 months ago

@zoyafuso-NOAA I think we should tap into the taxonomic brain trust on this one, but believe that whenever we're downgrading a species ID due to confidence we should be providing an aggregation option to the higher confidence ID level. So in this case, rolling those skates up to Rajidae from the spp. level during that low spp. confidence stanza makes sense to me. They were there, we have the data, we don't want it to drop out entirely, just get confidently renamed and appropriately aggregated.

zoyafuso-NOAA commented 3 months ago

My mistake, if we are aggregating to the next taxonomic level up, we would be rolling 435, 471, and 472 up to SPECIES_CODE 405 (Bathyraja sp.) pre-1999 (even though Bathyraja parmifera was renamed to Arcoraja parmifera). SPECIES_CODE 405 is not in the filter that @Lee-Cronin-Fine-NOAA uses in the query above and the inclusion of SPECIES_CODE 405 could introduce other mismatches. So regardless of what we do in the future with GAP_PRODUCTS, I think @Lee-Cronin-Fine-NOAA should use a custom pull from gapindex as provided above as this now creeps into the arena of non-standard requests.

@Lee-Cronin-Fine-NOAA , can you confirm how well the code above matches data you've used in the past.