MargaretSiple-NOAA / goa-ai-data-reports

Automate data reports for GOA and AI surveys
1 stars 1 forks source link

Table 2 #25

Closed Ned-Laman-NOAA closed 6 months ago

Ned-Laman-NOAA commented 1 year ago

Table 2 comes in 3 parts: all districts (SBS, EAI, CAI, and WAI separately), Aleutian districts (EAI, CAI, and WAI combined), and the entire survey area or BSAI combined. I wrote sqlplus to compare values and here are the results:

-- all districts -- There were three discrepancies between my numbers and those in the Word DRAFT; all 3 in the CAI. 2 were minor probably rounding, but I get 88.9 for POP and you have 92.6. sqlplus: with SURVEY_DISTRICTS as ( select a.*, row_number () over ( partition by survey_district order by cpue desc) rn from (select distinct l.inpfc_area survey_district, i.summary_area, s.common_name, (i.mean_wgt_cpue/100) cpue from AI.BIOMASS_INPFC i, RACEBASE.SPECIES s, GOA.GOA_STRATA l -- biomass schema is hardcoded and needs to be dynamic where i.species_code = s.species_code and i.summary_area = l.summary_area and year = 2022 and l.survey = 'AI') a -- "year" and "survey" are hard coded and need to be dynamic ) select survey_district, common_name species_name, round(cpue,1) cpue from SURVEY_DISTRICTS where rn <= 20 -- top 20 is hardcoded and dependent on cpue in descending order but doesn't need to be dynamic order by summary_area, cpue desc /

-- Aleutian districts -- Most numbers in the DRAFT table don't exactly match those coming out of Oracle. Rounding errors within one or two tenths are not of concern, but there were bigger differences that are (e.g., 187.8 kg/ha for POP in DRAFT and 166.7 from Oracle) sqlplus: -- Combined Aleutian districts EAI, CAI, WAI leverages BIOMASS_AREA table select * from ( select regulatory_area_name, common_name species_name, round(mean_wgt_cpue/100,1) cpue from ai.biomass_area a, racebase.species b where year = 2022 -- year hardcoded should be dynamic and regulatory_area_name = 'ALEUTIANS' -- regulatory_area_name hardcoded and a.species_code = b.species_code order by mean_wgt_cpue desc ) where rownum <= 20 order by cpue desc /

-- BSAI or "all" -- Just two discrepancies with All Areas Combined and they were 166 kg/ha POP in DRAFT and 165 coming out of Oracle, 25.5 vs 25.4 for pollock sqlplus: select * from ( select 'BSAI' area, common_name species_name, round(mean_wgt_cpue/100,1) cpue from ai.biomass_total a, racebase.species b where year = 2022 -- year hardcoded should be dynamic and a.species_code = b.species_code order by mean_wgt_cpue desc ) where rownum <= 20 order by cpue desc /

MargaretSiple-NOAA commented 1 year ago

Current solution for AI is to use Paul method for obtaining the table data. He says,

I used the BIOMASS_INPFC and BIOMASS_TOTAL tables in the AI schema of Oracle (created when Ned runs his biomass scripts to populate the various Oracle tables) to extract the mean CPUE's by INPFC area (as well as the total survey area) for the top 20 species in each INPFC area. Would this be too difficult to replicate with an R script?

For 2022, since Paul already produced this table, I just made a reformatted version that I pull directly into the report. It is in G:/ALEUTIAN/AI 2022/DataReport/Table 2/Table 2_AI2022_makeitlooklikethisplease.xlsx. In the future, this table should be made from the BIOMASS_TOTAL and BIOMASS_INPFC tables as above, or from Ned's SQL scripts from here. When I make it "from scratch" using this function, OR when Paul tries to make it from scratch in Excel using the same tables, the numbers are slightly off as Ned describes above (and off in the exact same way) so this is an issue of the summary tables being slightly different than what I think they are. To maintain consistency in the report, we have decided to use BIOMASS_TOTAL and BIOMASS_INPFC tables and forget about the from scratch version for now.

Ned-Laman-NOAA commented 1 year ago

FYI and for planning purposes, we hope to serve Biomass, CPUE, sizecomp, and agecomp from new unified (Bering, Gulf, and Aleutian results) tables in the GAP_PRODUCTS schema on Oracle in Fall of 2024. These tables will be available for testing this Spring (2023). Stay tuned!

MargaretSiple-NOAA commented 1 year ago

I have heard this! Thank you Ned for the exciting update. I don't think this will change my scripts too much; if it does, it'll be for the better anyway. This will simplify my workflow for producing the data report tables.

MargaretSiple-NOAA commented 6 months ago

I think this is all G now. The current table uses the BIOMASS table from GAP_PRODUCTS instead of trying to recreate it from the RACEBASE tables.