antadde / N-SDM

Nested-Species Distribution Modelling
MIT License
26 stars 1 forks source link

Automatically populate ODMAP with N-SDM information #1

Open simonrolph opened 2 years ago

simonrolph commented 2 years ago

When I saw the echo ODMAP protocol generated I was quite excited by the idea that it had filled out all the relevant model specific information but it just provides an empty template:

mkdir $wp/tmp/$project/ODMAP 2>/dev/null
curl -o $wp/tmp/$project/ODMAP/ODMAP.xlsx https://damariszurell.github.io/files/Zurell_etal_ODMAP.v1.0_TableS1.xlsx 2>/dev/null
rsync -a $wp/tmp/$project/ODMAP $svp/outputs/$project
echo ODMAP protocol generated

Feels like a missed opportunity to not autocomplete as much of the ODMAP as possible. Here are some of the elements that could be autocompleted:

· Spatial Extent (Lon / Lat)
· Spatial resolution
· Temporal extent/time period
· Temporal resolution, if applicable ·  Model algorithms
· Conceptual description of modelling steps including model fitting, assessment and prediction
· Specify modelling platform incl. version, key packages used
· Taxon names
· Selection of training data (for model fitting)
· Selection of validation data (withheld from model fitting, used for estimating prediction error for model selection, model averaging or ensemble): e.g., cross-validation method
· State predictor variables used
· Spatial resolution and spatial extent of raw data, if different from biodiversity data
· Map projection (coordinate reference system)
· Temporal resolution and temporal extent of raw data, if applicable
· Details on data processing and on spatial, temporal and thematic scaling: e.g. upscaling/downscaling, transformations, normalisations, thematic aggregations (e.g. of land cover classes), measures to address spatial uncertainties · Details on pre-selection of variables, if applicable
· Models settings for all selected algorithms (including default settings of specific platforms/packages, weighting of data etc.)
· Assessment of model coefficients
· Details on quantification of uncertainty in model coefficients, e.g. resampling
· Assessment of variable importance
· Model selection strategy: e.g. information-theoretic approach for variable selection, shrinkage and regularization
· Method for model averaging: e.g. derivation of weights
· Ensemble method
· Method for addressing spatial autocorrelation in residuals
· Method for addressing temporal autocorrelation in residuals
· Method to account for nested data: e.g., fixed and random effects

Some of this would be filled either by interpretation of settings.csv or by writing descriptions for the statistical models and processing steps that are carried out in the N-SDM workflow.

This would greatly speed up the generation of an ODMAP and make it more likely that an ODMAP will be used.

antadde commented 2 years ago

Hi Simon, good point, and thanks for the suggestions! For now we are only "generating" an empty protocol, in the next N-SDM version we would aim for automatic filling (work in progress!). You are probably one of the first external N-SDM user, did you manage to run it on your local cluster?

simonrolph commented 2 years ago

Yes it seemed to run fine I think!

Welcome to this new nsdm-project run c1a6dc
Submitted batch job 15489005
N-SDM settings defined and species occurence data for 3 species disaggregated
Starting N-SDM run 1 out of 1 runs
N-SDM settings updated
Submitted batch job 15489308
PRE modelling datasets generated
Submitted batch job 15489446
GLO data preparation and covariate selection done
Submitted batch job 15489797
GLO modelling done
Submitted batch job 15492739
GLO ensembling done
Submitted batch job 15492942
LOC data preparation and covariate selection done
Submitted batch job 15493560
LOC modelling done
Submitted batch job 15496116
LOC ensembling and scale nesting done
Submitted batch job 15496479
individual FUT GLO predictions done
Submitted batch job 15500276
FUT GLO ensembling done
Submitted batch job 15500929
individual FUT LOC predictions done
Submitted batch job 15503825
FUT LOC ensembling and scale nesting done
Submitted batch job 15504262
Final evaluation done
Sacct outputs analysis done
Main outputs sync to saving location
ODMAP protocol generated
Finished!
antadde commented 2 years ago

Looks good, with outputs (e.g. ensemble maps) generated at saving location?

simonrolph commented 2 years ago

Lots of outputs: image I haven't looked through too thoroughly but there's lots of .rds objects in there

I've added a few github issues with the things I encountered but otherwise all seemed to run fine and fairly straightforward. Might be a bit more complicated trying to run something with my own data but I'm sure when the paper is published that'll help

antadde commented 2 years ago

Excellent, and thanks again for opening the other issues, they will help to make N-SDM compatible to more and more configurations. Details on the inputs/outputs will indeed be provided in the companion paper, I hope it will be published soon.