SPINacc

A spinup acceleration tool for land surface model (LSM) family of ORCHIDEE.

Concept: The proposed machine-learning (ML)-enabled spin-up acceleration procedure (MLA) predicts the steady-state of any land pixel of the full model domain after training on a representative subset of pixels. As the computational efficiency of the current generation of LSMs scales linearly with the number of pixels and years simulated, MLA reduces the computation time quasi-linearly with the number of pixels predicted by ML.

Documentation of aims, concepts, workflows are described in Sun et al.202 [open-source]: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.16623

202208_ML_manuscript_figures_v1 0 pptx (2)

CONTENT

The SPINacc package includes:

job - the job file for a bash environment
job_tcsh - the job file for a tcsh environment
main.py - the main python module
Tools/* - folder with the other python modules
DEF_*/ - folders containting the configuration files for each of the supported ORCHIDEE versions
AuxilaryTools/SteadyState_checker.py - tool to assess the state of equilibration in ORCHIDEE simulations
tests/ - the reproducibility code in Python
requirements.txt - listing necessary dependencies to use SPINacc
ORCHIDEE_cecill.txt - the same license used by ORCHIDEE
docs/ - more detailed documentation about ORCHIDEE simulations

INFORMATION FOR USERS:

HOW TO RUN THE CODE:

Here are the steps to launch the different tasks of this repository (and the reproducibility tests associated):

Download the code: git clone git@github.com:CALIPSO-project/SPINacc.git
Find the associated ZENODO repository online (for reproducibility test including the corresponding ORCHIDEE forcing data) here: [https://doi.org/10.5281/zenodo.10514124]
From ZENODO: DOWNLOAD ORCHIDEE_forcing_data.zip, unzip and store it in a directory '/your/path/to/SPINacc_ref/'
From ZENODO: DOWNLOAD Reproducibility_tests_reference.zip, unzip and store it in a directory '/your/path/to/reference/'
In your local machine: cd SPINacc
If you want to stay on the main code skip this point, otherwise do : __git checkout your_branch__
Create an execution directory: __mkdir EXE_DIR__
In __DEF_Trunk/varlist.json file : replace all the '/home/surface5/vbastri/'__ occurences with '/your/path/to/SPINacc_ref/vlad_files/vlad_files/'
Choose the task you want to launch. In DEF_TRUNK/MLacc.def: in config[3] section put 1 (for task 1), in config[5] section put your path to your EXE_DIR and in config[7] put 0 for task 1 at least (for the following tasks you can use previous results).
In job : setenv dirpython '/your/path/to/SPINacc/' and __setenv dirdef 'DEF_Trunk/'__
In tests/config.py you have to modify: test_path=/your/path/to/SPINacc/EXE_DIR/
Also in tests/config.py you have to modify: reference_path='/home/surface10/mrasolon/files_for_zenodo/reference/EXE_DIR/' to __reference_path='/your/path/to/reference/'__
Then launch your first job using qsub -q short job, for task 1
For following tasks (2, 3, 4 and 5) you just need to modify the config[3] and config[7] sections in DEF_TRUNK/MLacc.def
For tasks 3 and 4, it is better to use qsub -q medium job
Launching tasks in chain (e.g. "1, 2" or "3, 4, 5") will be a possibility soon
The results of the tasks are located in your EXE_DIR
The results of reproducibility tests are stored in EXE_DIR/tests_results.txt

OVERVIEW OF THE INDIVIDUAL TASKS OF THE TOOL:

(The detail of each tasks of the tool is provided in docs/documentation.txt)

The different tasks are (the number of tasks does not correspond to sequence - YET):

Task 1 [optional]: Evaluates the impact of varying the number of K-means clusters on model performance, setting a default of 4 clusters and producing a ‘dist_all.png’ graph.
Task 2 performs the clustering using a K mean algorithm and saves the information on the location of the selected pixels (files starting with 'ID'). The location of the selected pixel (red) for a given PFT and all pixel with a cover fraction exceeding 'cluster_thres' [defined in varlist.json] (grey) are plotted in the figures 'ClustRes_PFT**.png'. Example of PFT2 is shown here:
Task 3: Creates compressed forcing files for ORCHIDEE, containing data for selected pixels only, aligned on a global pseudo-grid for efficient pixel-level simulations, with file specifications listed in varlist.json.
Task 4 performs the ML training on results from ORCHIDEE simulation using the compressed forcing (production mode: resp-format=compressed) or global forcing (debug mode: resp-format=global), extrapolation to a global grid and writing the state variables into global restart files for ORCHIDEE. In debug mode Task 4 also performs the evaluation of ML training outputs vs real model outputs.
Task 5 [optional]: Visualizes ML performance from Task 3, offering two evaluation modes: global pixel evaluation and leave-one-cross-validation (LOOCV) for training sites, generating plots for various state variables at the PFT level, including comparisons of ML predictions with conventional spinup data.

REPRODUCIBILITY TESTS :

The configuration file has been updated to include new parameters that control the execution of reproducibility tests for each task. These parameters are:

config[17]: Controls the reproducibility test for Task 1. config[19]: Controls the reproducibility test for Task 2. config[21]: Controls the reproducibility test for Task 3. config[23]: Controls the reproducibility test for Task 4.

For each parameter, setting the value to 1 enables the reproducibility test for the corresponding task, while setting it to 0 disables it.