afsc-gap-products / data-requests

An empty repo for tracking requests for GAP data
0 stars 0 forks source link

Duplicated Length Records in RACEBASE.LENGTH #71

Closed zoyafuso-NOAA closed 1 week ago

zoyafuso-NOAA commented 1 year ago

Data product requested (stratum CPUE, etc.): length data

Species: pollock

Region (GOA, AI, Bering Sea): Bering Sea

Research team making the request (PI /requester name, and tag team members with GitHub accounts, or email address if requestor does not have a GitHub account. Division is nice too.): GAP

Hi,

The following combinations in the table below of HAULJOIN, SEX, and LENGTH pollock length records in RACEBASE.LENGTH, consist of more than one record. For example, the query "SELECT * FROM RACEBASE.LENGTH WHERE HAULJOIN = 10446 AND SPECIES_CODE = 21740 AND SEX = 3 AND LENGTH = 180" yields two records, one with a frequency of 13, and another with a frequency of 618. Could someone please confirm that with me? If it is an issue, I am requesting that to be remedied.

Thanks in advance, @zoyafuso-NOAA

HAULJOIN | SPECIES_CODE | SEX | LENGTH -- | -- | -- | -- 10446 | 21740 | 3 | 180 10483 | 21740 | 3 | 120 10483 | 21740 | 3 | 140 10483 | 21740 | 3 | 150 10473 | 21740 | 3 | 130 10473 | 21740 | 3 | 150 10473 | 21740 | 3 | 160 10473 | 21740 | 3 | 170 10473 | 21740 | 3 | 200 10473 | 21740 | 3 | 210 10473 | 21740 | 3 | 220 10483 | 21740 | 3 | 160 10483 | 21740 | 3 | 170 12031 | 21740 | 3 | 120 12304 | 21740 | 3 | 130 10473 | 21740 | 3 | 140 10483 | 21740 | 3 | 130 10483 | 21740 | 3 | 180 10473 | 21740 | 3 | 180 10473 | 21740 | 3 | 190

NancyRoberson commented 1 year ago

BS_199101_37_177_POLLOCK2 BS_199101_37_177_POLLOCK

NancyRoberson commented 1 year ago

Hi Zach, I looked up the old length form for the first example you gave (see above). It looks like there was only one 18cm unsexed (not 13). It was a split tow (split factor 23.804% - see below) so maybe it extrapolated to a count of 618 but the original count wasn't removed? I am bringing in Duane for his thoughts.

NancyRoberson commented 1 year ago

BS_199101_37_177_POLLOCK3

Duane-Stevenson-NOAA commented 1 year ago

That count of 618 seems pretty unreasonable to me. I would bet that's the error, though I don't know how it would have gotten in there. An extrapolated count of 13 for juvenile pollock of that size class seems quite reasonable.

Duane

On Wed, Sep 6, 2023 at 2:52 PM Nancy Roberson @.***> wrote:

[image: BS_199101_37_177_POLLOCK3] https://user-images.githubusercontent.com/17622896/266149606-a230e692-69c0-4965-a476-a7b648ea90c1.JPG

— Reply to this email directly, view it on GitHub https://github.com/afsc-gap-products/data-requests/issues/71#issuecomment-1709177564, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANKDWATL2PZLNEXCEHSLTBDXZDWBVANCNFSM6AAAAAA4NTYVHM . You are receiving this because you were assigned.Message ID: @.***>

-- Duane Stevenson, Ph.D. Supervisory Fish Biologist Groundfish Assessment Program NMFS, Alaska Fisheries Science Center

zoyafuso-NOAA commented 1 year ago

Thanks for looking into that, Nancy. I don't know if it's worth your time going through all of the case examples but do you think a similar issue is going on with the other cases on that list?

NancyRoberson commented 1 year ago

Hi Zack and Duane, I am not sure what to make of all this. There were two tows in BS 199101 37 hauls 204 and 214 with a handful of pollock duplicates. The frequency counts don't make sense to me but maybe I am missing something. I am also looking at the pollock duplicates in BS 199101 78 hauls 141 and 165.
Let me know if you want me to make changes and/or want to see these pics in a different format.

204

199101_37_204_Poll_n78 199101_37_204_JuvPoll 199101_37_204_CatchPoll

214 199101_37_214_Poll_n78 199101_37_214_JuvPoll_n44 199101_37_214_CatcjPoll

zoyafuso-NOAA commented 2 weeks ago

@ChrisAnderson-NOAA , this was the issue I brought to you yesterday. If you have some time, could you try sinking your teeth into this? Thanks in advance!

Duane-Stevenson-NOAA commented 2 weeks ago

Based on what Nancy posted above, it looks to me like for these hauls the pollock adults and pollock juveniles were lengthed and extrapolated separately. Then the pollock juvenile code was changed to the standard pollock code at some point, but the extrapolated length counts were never combined. Thus, when there are length-class overlaps (e.g., 13 cm unsexed adult pollock and 13 cm unsexed juvenile pollock for the same haul), we get two entries for those combinations. I think that these length frequency numbers should be added together to create a single adult+juvenile combined extrapolated length frequency for each of the duplicated haul-length-sex combinations.

zoyafuso-NOAA commented 2 weeks ago

Yeah, that makes sense to me, thanks for looking into that, @Duane-Stevenson-NOAA . In gapindex, frequencies are combined whenever these duplicates are found, so it seems like we're properly accounting for the duplicates in our production code. How the combining happens before it hits RACEBASE.LENGTH is out of my scope. Do either you or @ChrisAnderson-NOAA know how to tackle that?

ChrisAnderson-NOAA commented 2 weeks ago

I can take care of that @zoyafuso-NOAA. These need to be changed in the table RACEBASE.LENGTH_ORIG and then to refresh the view. The Materialized view RACEBASE.LENGTH is built using another materialized view from RACE_DATA (MV_RACEBASE_LENGTH) for anything found in RACE_DATA final tables. It combines those lengths with the ones found in RACEBASE.LENGTH_ORIG for data that predates what is found in RACE_DATA.

zoyafuso-NOAA commented 2 weeks ago

Right on, thanks @ChrisAnderson-NOAA! I hope that's not too big of a lift.

ChrisAnderson-NOAA commented 1 week ago

I have updated lengths in the RACEBASE.LENGTH_ORIG table to remove all instances where there were multiple rows for the same Haul/species/length and where the hauls had ABUNDANCE_HAUL = Y . This updated 24 rows of frequencies and removed the 24 duplicate rows. Lastly, I refreshed the RACEBASE materialized views to show all edits in the RACEBASE.LENGTH MView.

The following query was used to identify all duplicate rows. It showed 24 rows prior to the edits and 0 returned rows after the edits;

select hauljoin, catchjoin, region, species_code, length, sex, COUNT(), abundance_haul from (select b.abundance_haul, a. from racebase.length_orig a join racebase.haul b on (a.region = b.region and a.vessel = b.vessel and a.cruise = b.cruise and a.haul = b.haul) where b.abundance_haul = 'Y') GROUP BY species_code, hauljoin, catchjoin, region, length, sex, abundance_haul HAVING COUNT(*) > 1 AND region IN ('BS', 'GOA', 'AI') ORDER BY hauljoin, length;

zoyafuso-NOAA commented 1 week ago

Thank you, @ChrisAnderson-NOAA !!!