afsc-gap-products / gap_products

This repository supports code used to create tables in the GAP_PRODUCTS Oracle schema. These tables include the master production tables, tables shared with AKFIN, and tables publicly shared on FOSS.
https://afsc-gap-products.github.io/gap_products/
Creative Commons Zero v1.0 Universal
5 stars 5 forks source link

Alaska Place composition data age and length population_count have increased alot! #40

Open Lee-Cronin-Fine-NOAA opened 2 months ago

Lee-Cronin-Fine-NOAA commented 2 months ago

Issue

I have been comparing the Alaska Plaice composition data pulled from GAP to the Alaska Plaice composition data from past assessments and the population_count values have increased alot across all years. For the age-composition data it has increased around 100,000k while for size composition it is around 1,000. I suspect something fishy is going on here and could use some help.

Thanks,

Lee

zoyafuso-NOAA commented 2 months ago

Hi @Lee-Cronin-Fine-NOAA ,

Can you provide some code as to how you pulled the GAP_PRODUCTS tables, maybe it's an filtering issue?

Lee-Cronin-Fine-NOAA commented 2 months ago

Pulling Alaska Plaice Age Comp Pulling Alaska Plaice size comp

zoyafuso-NOAA commented 2 months ago

For the agecomps, maybe add the extra filter WHERE AREA_ID = 99901 AND AGE >= 0 to the query. There are dummy codes -9 and -99 in the AGE field that have some meaning that I won't go into detail here. Similar for the sizecomp query, you could add the extra filter WHERE LENGTH_MM > 0 to filter out the -9 dummy code in that column. Let me know if that helps.

For reference, 99901 is the EBS_STANDARD area and 99900 is the EBS_STANDARD + NW area.

Lee-Cronin-Fine-NOAA commented 2 months ago

I added those suggestions and it doesn't change the results.

The issue is with individual size/age values for the population count. I have attached two tables to showing the issue from 1982. Once I created these tables I noticed the percent difference between the new and old population count values is less then one percent. Am I overreacting? Would love to hear your thoughts. Issue_with_age_comp_1982_example.csv Issue_with_size_comp_1982_example.csv

zoyafuso-NOAA commented 2 months ago

Very interesting. From where have previous AK plaice assessments been getting their compositional data? Comparing the comps in the GAPPRODUCTS.AKFIN* tables to those in the HAEHNR schema (where the legacy EBS tables come from on the GAP side) the numbers match up very well. I'm thinking its a data source issue and not a computational or querying issue.

Also confirming here that you are only looking at female age/size comps?

Lee-Cronin-Fine-NOAA commented 2 months ago

I was put in charge of the Alaska Plaice assessment last year, which was an update year. The only data needed for the update was the biomass index data, which GAP matched perfectly. The last full assessment of Alaska plaice was in 2021 and I have the composition data from that year which I am using to compare to the GAP data. I don't know how the pervious author obtained the composition data in 2021. I can try and write code to pull composition data from HAEHNR and see it if matches up to the GAP data.

Also, I am not just looking at female data. I just provided it as an example. I am looking at both male and female data.

I will be attending the GAP office hours today (remotely) so we can continue to discuss this in person.

Lee-Cronin-Fine-NOAA commented 2 months ago

Hey Zack,

I have checked the GAP size composition and age composition data for Alaska Plaice with the AFSC composition data and things match up really well. There is one issue I wanted to point out. In the age composition data, there are 7 values that have differences much larger than all the others. The other differences range from -3 to 3 while these 7 values are much bigger. They all come from the same year, 2008. Not all the values from 2008 have this issue. I have attached a table that shows the issue. This is not a huge problem that I think will impact my assessment but it is a weird thing I noticed that I wanted to point out.

Thanks, Lee Issue_with_age_comp_2008.csv

zoyafuso-NOAA commented 2 months ago

Hey Lee,

An update to these age comp calculations we've done for GAP_PRODUCTS has been to remove input otolith data that came from non-standard hauls. In 2008, there was one haul of AK plaice otolith data that was included in the calculations that produced the legacy agecomp table that had poor performance. In the past, this poor-performing haul was removed in the standard biomass calculation but not in the age comp calculation. We've changed this so that the hauls that went into the biomass -> size comp -> age comp calculations are consistent, which we see as an advantage for reproducibility.