gjearevoll / BioDivMapping

A pipeline dedicated to analysing and visualising the biodiversity of different taxa in Norway
GNU General Public License v3.0
5 stars 3 forks source link

storage/resumption/update of existing dateAccessed pipeline #113

Open RRTogunov opened 7 months ago

RRTogunov commented 7 months ago

code would benefit from 2 bits of functionality:

  1. Resumption of existing date run. Specifically, if a dateAccessed folder has been created and part of pipeline run with files produced, initialiseRepository should return a message of the current state of the model run (eg last pipeline that was executed; and where things have been left off).
  2. updating parameters/arguments of a partly or fully completed run. specifically, if a dateAccessed folder exists, and any of its contents differ from those stated in the masterScript (eg region, resolution, mesh, focalSpecies) there should be option to ether resume pipeline assuming arguments set during prior run (what is currently done), or updating the model specification and running everything down the pipeline that is affected. This may be best executed by transitioning pipeline into targets framework.
    i) to facilitate option of either updating pipeline vs resuming pipeline, it might be good to create an additional replicate of all the files created by initialiseRepository, to allow for changing files within dateAccessed as a means of updating model (eg. updating which covariates are included in the model). note: This may be handled automatically with targets. ii) currently, initialiseRepository does some initial processing of some initial pipeline parameters (eg. filtering and updating usageKeys of focalTaxa, and filtering of polyphyleticSpecies). but this isn't ideal for reproducibility (eg you don't know what the state of focalTaxa.csv was when pipeline was first initialised), and hinders updating model (eg if inclusion of some taxa was set to FALSE, they are deleted from the CSV, and would have to be added back manually to be included; rather than just changing FALSE to TRUE). Best solution might be as in point above, to create an identical copy of the initialised pipeline arguments that is only looked at by initialiseRepository.R, in addition to the current partly processed files referenced by other scripts.