SarinaAtkinson-NOAA / SEFSC-ODM-Management-History

A method for generating an analysis-ready version of the Management History Dataset
Other
0 stars 0 forks source link

Create process to make sure new changes don't break existing code #10

Closed SarinaAtkinson-NOAA closed 1 year ago

SarinaAtkinson-NOAA commented 1 year ago

Identify test clusters from SEDAR working papers. Add as its own script. Save results as .Rdata? So we have something static to compare to. use comparedf()

SarinaAtkinson-NOAA commented 1 year ago

@SarinaAtkinson-NOAA will start building the test .Rdata file. This file as well as the test script will be housed in "TEST" branch. That way we do not have to worry about changes in these files getting accidently pushed or worrying about being out of sync between the fork main and SEFSC main. Automated folders and .Rdata files will be created to house the result from the processing code. This folder and results will be in the gitignore and live only on our personal machines to save repo space.

SarinaAtkinson-NOAA commented 1 year ago

@AdyanRios-NOAA take at look at this workflow and we can discuss it next week. Steps:

  1. In main_MH_prep added line at the end to save R environment as a .RData file once all MH steps have been ran (00-05). This .RData is saved with a date in the filename so we do not overwrite each time.
  2. Created folder called "test"
  3. Within test folder, we have scripts to create the test.RData file (currently only has 2 clusters) and another script to compare test to our recent analysis ready dataset.
  4. All the MH_test_process.R script does is load in our .RData files, subset analysis ready for clusters in test, run compare_df, and if differences, visualize in viewer as a table.

The compareDF package is great and I particularly love the create_output_table() function. It does everything we need so we can keep the script short!

AdyanRios-NOAA commented 1 year ago

Will do ahead of our next meeting!

AdyanRios-NOAA commented 1 year ago

Should we move librarian::shelf() to the top, and adding other packages (here, tidyverse, etc), so that they can be run stand alone in a fresh R environment, without any running other scrips first?

Also, I am getting an error in result.

result <- compare_df(mynew_analysis_ready, test, c("CLUSTER")) Error in check_if_comparable(both_tables$df_new, both_tables$df_old, group_col, : The two data frames have different columns!

AdyanRios-NOAA commented 1 year ago

I know we had agreed to .RData files, but I just remembered that RDS files are better for this purpose. That way when they are read in you intentionally name a single object. When you read in RData files for a single object, the object's name is somewhat less obvious. We can chat more about this on Friday.

AdyanRios-NOAA commented 1 year ago

Is cluster 365 validated/approved? 365 is a South Atlantic Red Snapper cluster, not one of the ones included in the MH ODM red snapper working paper(Gulf of Mexico).

SarinaAtkinson-NOAA commented 1 year ago

Updated workflow to save to RDS rather than RDATA.