MargaretSiple-NOAA / goa-ai-data-reports

Automate data reports for GOA and AI surveys
1 stars 1 forks source link

Send Megsie the preferred SQL script for generating "Table 3" #47

Closed MargaretSiple-NOAA closed 10 months ago

MargaretSiple-NOAA commented 11 months ago

The layout for what we're calling "Table 3" is in issue #39.

Ned-Laman-NOAA commented 11 months ago

There are presently two scripts (one for AI [G:\ALEUTIAN\datareport\ai_cpuetable.sql] and one for GOA [G:\GOA\datareport\goa_cpuetable.sql]) that can be used to generate Table 3 for the Data Processed Reports. Instructions for running these SQL scripts are here (G:\AI-GOA\Instructions&Procedures\Data Report\Table 3\Instructions CPUE tables by INPFC area and depth_REVISED OCT 12 2017_B.doc--what an awesome name!). Short version is the code generates an Oracle object entitle TABLE_BIOMASS which is to some degree preformatted to than get slotted into a pre-existing Excel table format examples of which can be found here )G:\GOA\GOA 2007\Data Report\Data Report GOA 2007\Tables\Tables_SPECIES EXCEL files).

Let me know how you'd like to proceed. I'm willing to make some time to wade into the .sql scripts referenced above to shorten, unify, and optimize the sql. However, you may want to take a different approach where more of the wrangling happens in the R side.

Happy to help so just let me know where I can contribute!

MargaretSiple-NOAA commented 10 months ago

Hi Ned-- this is probably not urgent for this year but I've realized in helping Paul troubleshoot an issue he's having with Table 3, and in trying to turn this into an R script (issue #33 ), that one needs a separate special login to run this script (the AIGOA_WORK_DATA login). In general, this means that it will be a little less reproducible and it would be good to be able to allow future report-makers to run all the scripts in this repo using their one standard login if we can do it. I am not familiar enough with SQL to know how difficult this would be to implement, but data-wise it would be best.

Ned-Laman-NOAA commented 10 months ago

Hi Megsie-I don't think that you can "only" run the script from AIGOA_WORK_DATA. It looks to me like the original SQLPlus references only one object external to the user schema that you log in to and that's a table with public select privileges (GOA.GOA_STRATA). The main advantage of using the AIGOA_WORK_DATA schema is that these older scripts create and (usually) drop a number of objects as they proceed and keeping your own schema clean of these is a nice housekeeping agenda. I think that for all of these data-access-to-data-product tools we're building we should determine the best method for providing complete documentation and reproducibility which is likely linked to the strategy chosen to generate these products (i.e., is it better to pull all data into R and do the manipulation there? or is it better to tailor the data coming out of Oracle with SQLPlus so that data wrangling in R is minimized?). I think there are arguments in favor of both approaches but ultimately the person tasked with code writing, documentation, and maintenance of same should decide the path forward.

MargaretSiple-NOAA commented 10 months ago

I was able to run the script from my own schema in Oracle so I take back what I said about the AIGOA_WORK_DATA schema! Still having trouble running it in R, so closing this issue to reflect that we need to complete re-make this table in R instead of SQL 😪. There's already another issue for that anyway.