afsc-gap-products / gap_products

This repository supports code used to create tables in the GAP_PRODUCTS Oracle schema. These tables include the master production tables, tables shared with AKFIN, and tables publicly shared on FOSS.
https://afsc-gap-products.github.io/gap_products/
Creative Commons Zero v1.0 Universal
5 stars 5 forks source link

Idea for version control #9

Closed sean-rohan-NOAA closed 6 months ago

sean-rohan-NOAA commented 8 months ago

Following up about ideas for version control and tracking changes.

Here's one idea:

  1. Create a single DOI for GAP_PRODUCTS data product along with a public metadata record. Create a data management plan and archive a snapshot once a year (with NCEI?) to ensure it's discoverable.
  2. Maintain a continuous versioning and update the version after each update, including changes that occur within a year. Name versions using a YYYY.MM.DD-R scheme (e.g. 2023.11.09-1) to make it easier to figure out which version of the data folks are using.
  3. Encourage users to cite the data product and include the accession date and DOI in the citation. Provide a recommended citation in the documentation.
  4. Create a plain text NEWS/changelog that describes changes in each versioned release. Use descriptive titles for changes, e.g.:
November 11, 2023
GAP_PRODUCTS Version 2023.11.09-1 

A brief description of the changes in this versioned release.

DATA UPDATE
 - Corrected northern rock sole (species_code = 10261) length data errors from the 2017 NBS survey. Added 500 length samples from 20 hauls (vessel 162, cruise 201702) that were erroneously omitted from the database due to a data transformation error.
 - Grid cells for the Gulf of Alaska stratum areas recalculated based on...
 - Corrected surface temperature error from cruise 201902, vessel 94, haul 38. Temperature erroneously calculated as XX changed to YY based on ZZ.

METHODOLOGICAL CHANGE
- Method for calculating design-based abundance indices for REGION X changed from YY to ZZ.
Lewis-Barnett-NOAA commented 8 months ago

Thanks Sean, I think this is a great plan. In the changelog, when methods change we can also flag commit(s) that show how the code carrying out any computations has changed.

It also has the added bonus of being able to track attribution and use of our data more cleanly using the DOI and access date/release version.

On Thu, Nov 9, 2023 at 9:30 AM Sean Rohan @.***> wrote:

Following up about ideas for version control and tracking changes.

Here's one idea:

  • Create a single DOI for GAP_PRODUCTS data product along with a public metadata record. Create a data management plan and archive a snapshot once a year (with NCEI?) to ensure it's discoverable.
  • Maintain a continuous versioning and update the version after each update, including changes that occur within a year. Name versions using a YYYY.MM.DD-R scheme (e.g. 2023.11.09-1) to make it easier to figure out which version of the data folks are using.
  • Encourage users to cite the data product and include the accession date and DOI in the citation. Provide a recommended citation in the documentation.
  • Create a plain text NEWS/changelog that describes changes in each versioned release. Use descriptive titles for changes, e.g.:

November 11, 2023 GAP_PRODUCTS Version 2023.11.09-1

A brief description of the changes in this versioned release.

DATA UPDATE

  • Corrected northern rock sole (species_code = 10261) length data errors from the 2017 NBS survey. Added 500 length samples from 20 hauls (vessel 162, cruise 201702) that were erroneously omitted from the database due to a data transformation error.
  • Grid cells for the Gulf of Alaska stratum areas recalculated based on...
  • Corrected surface temperature error from cruise 201902, vessel 94, haul 38. Temperature erroneously calculated as XX changed to YY based on ZZ.

METHODOLOGICAL CHANGE

  • Method for calculating design-based abundance indices for REGION X changed from YY to ZZ.

— Reply to this email directly, view it on GitHub https://github.com/afsc-gap-products/gap_products/issues/9, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKMJP6AXQREG57KSSFQLOTYDUHLTAVCNFSM6AAAAAA7E53662VHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4DMMBZG4YDIMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Lewis Barnett, PhD (he/him/his) Research Fish Biologist

NOAA Fisheries, Alaska Fisheries Science Center 7600 Sand Point Way NE, Bldg 4 Seattle, Washington 98115 Google Voice: (206) 526-4111

EmilyMarkowitz-NOAA commented 7 months ago

I also think these are great ideas! While yes, there is more work to do on these points, I wanted to let you know we are working towards this and I have a few small updates.

I've put together the very initial steps towards (idea 2) preparing the .txts for the news page with https://github.com/afsc-gap-products/gap_products/commit/67c0a0e01b0d865439b8af7b4c8551465c12f434 where @zoyafuso-NOAA and I have created the first (still not meeting all of the points above) .txt change log files and (idea 4) updated the news.qmd page to curate these txt files https://github.com/afsc-gap-products/gap_products/commit/91580d0bae0833ab98023657777dbb16523ef6fb . I am rerunning the quarto book now and hope to have these initial changes implemented on the page soon.

Regarding ideas 1 and 3: We still need to prepare a DOI for this quarto book and DOIs for these data products, but there is the current CITATION.bib file for the data products documented in this quarto book. I think we want to hold off until the spring to make these DOIs, as that is the deadline for SSMA and others to provide their final review of the data. We are still working out the exact data archiving plan (noted in idea 1) and have a few ideas to review with OFIS.

I'll continue to work on these and report back with progress! @sean-rohan-NOAA can you send us a little more information about sharing data with NCEI, if you have it?

zoyafuso-NOAA commented 7 months ago

To add here, the temporary solution is saving a version of the GAP_PRODUCTS tables (as .csv files), the input data (RDS file) and reference tables (as .csv files) used to produce those tables, and the changelog text doc in a zipped file in the G: drive (G:\GAP_PRODUCTS_Archives) after each test production run. Each run is labeled by the date it was run.

sean-rohan-NOAA commented 7 months ago

@EmilyMarkowitz-NOAA @zoyafuso-NOAA Great to see this moving forward. Nancy Roberson is probably the best person to talk to about archiving with NCEI if that's a direction you want to move in, although I was only suggesting that as a potential avenue because it's the one I'm familiar with. You've obviously done quite a bit of work with FOSS but, since I work more with environmental data products, I'm more familiar with NOSS. With NCEI, there's a process involved with preparing and submitting the data, and I'm not sure about the constraints and limitations placed on providing the data.

More info about about the NCEI's services: https://www.ncei.noaa.gov/services