Closed MattCallahan-NOAA closed 4 months ago
Hi @MattCallahan-NOAA and @rclairer. Thanks for this flag. This may be related to the #20 issue. We have a potential fix to the query that creates the AKFIN length table. When I get back later this week, I will rerun those corrected queries and then check back in with this issue.
Awesome, thanks Zack, hopefully that fix solves this too.
Hi @MattCallahan-NOAA
I updated all of the the AKFIN_* tables on the GAP_PRODUCTS schema with @Ned-Laman-NOAA revised SQL code from https://github.com/afsc-gap-products/gap_products/commit/8521fcf7986616a692fb556f94bd992abd8d893b. I ran these queries and don't seem to get duplicates anymore. Can you confirm and if you're good, close the comment.
Super! Thanks for finding this issue @MattCallahan-NOAA and for sorting this, @zoyafuso-NOAA ! Just to confirm, I see that:
There are no and sex = 1 and length_mm = 620
entries from the first query now - is this right?
And these look like the second query has been successfully been un-duplicated:
Do these look right? Note @MattCallahan-NOAA, you'll need to look at the AFSC oracle to see these as we haven't asked OFIS to create the oracle dump file for AKFIN yet.
I suspect that there is some sort of new issue with missing data.
I get 62 distinct pollock length bins in haul -6619 on the AKFIN side
Also the row count for the akfin_length table went from 148 million to 4 million, which is a much bigger drop than I would expect.
The second example seems to be resolved though.
I didn't even think to check of the sizes of these tables, good catch. 4.6 million records is within the range of reason, as RACEBASE.LENGTH only has 7 million records.
A good check on HAULJOIN -6619 is to do that same query on RACEBASE.LENGTH and I can confirm that there should be four unique length records.
select * from GAP_PRODUCTS.AKFIN_LENGTH where hauljoin = -6619 and species_code = 21740
select * from racebase.length where hauljoin = -6619 and species_code = 21740
Great, this will make querying the length table way faster!
Thanks for the tip of going back to RACEBASE for QC questions too.
If you're happy with the table than so am I.
Issue
Hey I got a question from @rclairer about duplicate rows in the gap_products.akfin_length table, is this expected? Here are a couple of examples but the duplicates seem widespread.
--4 duplicated rows select * from gap_products.akfin_length where hauljoin = -6619 and species_code = 21740 and sex = 1 and frequency = 1 and length = 620 and length_type = 1;
--7 duplicated rows select * from gap_products.akfin_length where hauljoin = -5812 and species_code = 30576 and length = 570 and sex = 1 and frequency = 1 and length_type = 1 and sample_type = 1;