USGS-R / regional-hydrologic-forcings-ml

Repo for machine learning models for regional prediction of hydrologic forcing functions. Includes probabilistic seasonal high flow regions for CONUS, and prediction of high flow metrics for selected regions.
Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

Documenting Data and Model Versions #177

Closed slevin75 closed 1 year ago

slevin75 commented 1 year ago

Discussed in https://github.com/USGS-R/regional-hydrologic-forcings-ml/discussions/1

Originally posted by **jds485** November 1, 2021 Code version control is tracked by git. Data is not version controlled on Caldera. We need to document which data are used with which model versions. I'm thinking of tracking the following in a GitHub-shared spreadsheet that we fill out for each model run: Date Model Run, Model Run Name, Description, Git Commit Date, Git Commit ID, `target` Name for Model, Path to a copy of `_targets` folder after model run (R-readable inputs and outputs) The copy of the `_targets` folder would serve as our coupled data-model version control. Is this a good idea? Any other variables we should track?
slevin75 commented 1 year ago

cstillwellusgs on Nov 2, 2021 Maintainer Seems good to me. If we list the target Name for Model, could we also list the target Name for Data?

If we are generating a log file for each model run, we could include all this info the log as well (for redundancy).

slevin75 commented 1 year ago

jds485 on Nov 2, 2021 Maintainer Author If we list the target Name for Model, could we also list the target Name for Data?

I think there would be multiple data targets, so probably too many to list in the spreadsheet. That's one reason I'm thinking the copy of the _targets folder would be useful.

If we are generating a log file for each model run, we could include all this info the log as well (for redundancy).

The log files I've seen for file targets track names of files. Definitely open to suggestions for auto-generating a log file for each model run to track this info. Maybe we can make a "write_log()" function that we call at the end of each model run target.