afsc-gap-products / gap_products

This repository supports code used to create tables in the GAP_PRODUCTS Oracle schema. These tables include the master production tables, tables shared with AKFIN, and tables publicly shared on FOSS.
https://afsc-gap-products.github.io/gap_products/
Creative Commons Zero v1.0 Universal
5 stars 5 forks source link

Addition of FOSS_SPECIES, FOSS_SURVEY_SPECIES, and changes to FOSS_CATCH in GAP_PRODUCTS #4

Closed EmilyMarkowitz-NOAA closed 8 months ago

EmilyMarkowitz-NOAA commented 9 months ago

Request

In an effort to bring zero-filled data to FOSS, I met with the FOSS team to review the FOSS_HAUL, FOSS_CATCH, and FOSS_TAXON_GROUP tables. They requested that we modify the FOSS_CATCH table to: 1) remove species context data (e.g., COMMON_NAME, ITIS code) and include that data in another table (FOSS_SPECIES). Removing these columns will help cut down on the number of columns in FOSS_CATCH and make it easier for the FOSS team to port in and apply this table on the platform. 2) remove zero-inflation from the FOSS_CATCH table and instead use another new reference table (FOSS_SURVEY_SPECIES) to fill in the species information and zero-inflation on their end. Creating this new reference table will help cut down on the observations in FOSS_CATCH and make it easier for the FOSS team to apply these tables on the platform and will tell the FOSS team what species need to be zero-inflated for which survey.

Essentially, columns in these new tables will be:

Action

I prepared the foss-restructure branch with the above changes. I also slightly modified the code/foss.R script to allow for these other table comments.

Now, to make a zero-inflated table with these proposed changes, the oracle users would now join the FOSS_CATCH, FOSS_HAUL, and FOSS_SPECIES with code like below:

SELECT *
FROM GAP_PRODUCTS.FOSS_SURVEY_SPECIES st
FULL JOIN GAP_PRODUCTS.FOSS_SPECIES ss
ON st.SPECIES_CODE = ss.SPECIES_CODE
FULL JOIN GAP_PRODUCTS.FOSS_HAUL hh
ON st.SURVEY_DEFINITION_ID = hh.SURVEY_DEFINITION_ID
LEFT JOIN GAP_PRODUCTS.FOSS_CATCH cc
ON st.SPECIES_CODE = cc.SPECIES_CODE
AND hh.HAULJOIN = cc.HAULJOIN;

@zoyafuso-NOAA, can you (or a designee) double-check my work, make sure these changes work on your end, and rerun these tables? Let me know if you have any comments or suggestions, or if I need to make any other amendments.

zoyafuso-NOAA commented 8 months ago

Hi Emily, thanks for setting this up. Few questions/comments:

1) Is FOSS_TAXON_GROUP still being used? In other words, are the new FOSS_SPECIES and FOSS_SURVEY_SPECIES tables replacing FOSS_TAXON_GROUP?

2) Is it FOSS_SURVEY_SPECIES or FOSS_SPECIES_SURVEY?

3) There is an order of operation issue when looping through these tables. It seems like this is the order: a) FOSS_HAUL, b) FOSS_CATCH, c) FOSS_TAXON_GROUP (if still used), d) FOSS_SPECIES (dependent on FOSS_CATCH), e) FOSS_SURVEY_SPECIES (dependent on both FOSS_HAUL and FOSS_CATCH), and f) FOSS_CPUE_PRESONLY (dependent on FOSS_CATCH, FOSS_HAUL).

4) Line 40 of FOSS_CPUE_PRESONLY.sql references the ID_RANK column from FOSS_CATCH but the new FOSS_CATCH table does not have that column anymore. You have more familiarity with how these tables need to be structured so I will leave it to you to figure out how to deal with that. I think that column is from GAP_PRODUCTS.TAXONOMIC_CLASSIFICATION.

5) There is a difference between the ID rank naming convention between the FOSS tables (RANK_ID) and GAP_PRODUCTS.TAXONOMIC_CLASSIFICATION (ID_RANK).

EmilyMarkowitz-NOAA commented 8 months ago

Great questions and catches! I just pushed some new changes (https://github.com/afsc-gap-products/gap_products/commit/8099a6b61a0650359d5e676e102bc17094abadf3) with fixes to your above questions/comments.

  1. Yes, FOSS_TAXON_GROUP is still being used. This table is needed to search through species by taxonomic group on the FOSS interface (like below where you can currently search for phrases; see current set up below). This is separate from FOSS_SPECIES and FOSS_SURVEY_SPECIES. image
  2. FOSS_SURVEY_SPECIES - I see and just fixed the typo in the SQL file name for this in my latest commit to this branch.
  3. Yes, though some steps can happen in interchangeable order. The order:
    • a) FOSS_HAUL
    • b) FOSS_CATCH
    • c) interchangeable:
      • FOSS_TAXON_GROUP (used to search for groups of species by taxonomy; dependant on TAXONOMIC_CLASSIFICATION)
      • FOSS_SPECIES (dependent on FOSS_CATCH and TAXONOMIC_CLASSIFICATION)
      • FOSS_SURVEY_SPECIES (dependent on both FOSS_HAUL and FOSS_CATCH)
    • d) FOSS_CPUE_PRESONLY (dependent on FOSS_CATCH, FOSS_HAUL, and FOSS_SPECIES (edits made in new commit)).
  4. Great catch - I now integrated the FOSS_SPECIES table into the FOSS_CPUE_PRESONLY script which will provide the RANK_ID column.
  5. The ID_RANK columns in both TAXONOMIC_CLASSIFICATION and FOSS_SPECIES are the same. The RANK_ID column in FOSS_TAXON_GROUP is slightly different, because it unpivots all of the columns to obtain all of the taxonomic groups, but still matches the METADATA_COLUMN column description: "The taxonomic rank of a taxon identification."

I also added "*_TEST" views of these tables to the schema, to make sure these tables work and produce the planned tables, if you want to see.

EmilyMarkowitz-NOAA commented 8 months ago

Looks perfect! Thanks so much for reviewing that branch merge https://github.com/afsc-gap-products/gap_products/pull/5#event-10747280878 and fixing those last few things! I just reran the quarto docs and I'll send to the FOSS folks for implementation!