EcoExtreML / STEMMUS_SCOPE

Integrated code of SCOPE and STEMMUS
GNU General Public License v3.0
14 stars 5 forks source link

global variables in STEMMUS_SCOPE #82

Open BSchilperoort opened 2 years ago

BSchilperoort commented 2 years ago

Currently in STEMMUS_SCOPE, the code heavily relies on the use of global variables and scripts. This makes maintaining the code very difficult, and can introduce bugs that are hard to find and difficult to solve.

To this end I would ask to (bit by bit) remove the reliance on global variables, and instead pass variables to functions. This will make the code easier to understand for new users, as variables can then be traced and it is clear where they are used and/or modified.

Structs

Another issue is the sheer number of different variables used by the scripts. One way to easily reduce this is to make more use of Matlab's struct, in which multiple variables can be stored.

A good example for this is in the script Enrgy_sub.m, where line 15 is 611 characters long! https://github.com/EcoExtreML/STEMMUS_SCOPE/blob/4cdb0e28349c8b5fb469f99b0689495b70c8076b/src/Enrgy_sub.m#L15

Example

Original code

Let's say that there is a script soilfunc.m, which is meant to calculate FOSL and from the variables FOC FOS and MSOC. soilfunc.m

global FOC FOS FOSL MSOC

% fraction of soil organic matter
VPERSOC=MSOC.*2700./((MSOC.*2700)+(1-MSOC).*1300);  
FOSL=1-FOC-FOS-VPERSOC;

In the main script this is called in the following way:

main.m

global FOC FOS FOSL MSOC

run soilfunc.m;

When only viewing the main script we have no idea what happens inside soilfunc.m, we do not see which variables are modified and which new variables are intended to be calculated. Only by opening the script that is called by run can we start to see what happens.

In this example this is only a single level deep, but in some cases scripts and functions are called from within scripts, making tracing variables increasingly difficult. As an illustration see the following image: image In here, the main script calls two different scripts. Variable a is not in the main workspace at all, and cannot be easily traced. However, script_a and script_b are able to exchange information using this global variable. This is not a big issue if it were only a single variable, however, STEMMUS_SCOPE requires hundreds of variables.

For an example, see the first 14 lines of Enrgy_sub.m

Refactored code:

The example code can be rewritten in the following way: soilfunc.m

function FOSL = soilfunc(FOC, FOS, MSOC)
    % fraction of soil organic matter
    VPERSOC=MSOC.*2700./((MSOC.*2700)+(1-MSOC).*1300);  
    FOSL=1-FOC-FOS-VPERSOC;
end

In the main script, the function is then called like this: main.m

soil_params.FOSL = soilfunc(soil_params.FOC, soil_params.FOS, soil_params.MSOC);

This makes it clear that soilfunc needs the FOC FOS and MSOC variables to be able to do the calculation, and that it only returns FOSL, without having to go through the soilfunc script. It also avoids cluttering up the workspace by storing the soil-related parameters into a single struct.

If then later in the main script a "soil heat transfer model" is called, the soil parameters and initial conditions can be passed using structs: main.m

    soil_temp = soil_heat_model(soil_params, soil_temp);

As you can see this makes the code much more clean and readable, which makes it possible to more easily optimize the code, maintain the code, and make any necessary changes.

SarahAlidoost commented 2 years ago

@BSchilperoort thank you, very important points and helpful tips. @yijianzeng @Yunfei-Wang1993 @Crystal-szj @DanyangYu here are good practices in writing codes. Please consider following these practices when fixing codes or developing new functionalities.

DanyangYu commented 2 years ago

Thanks for your information

Crystal-szj commented 2 years ago

Thanks for your useful and helpful tips.

bobzsu commented 2 years ago

excellent suggestions! Bart, after you have gone through the routines/functions, we should work out step-by-step replacements of the structures. I looked at the different components, but also had difficulty in following the information flow between the different parts.

yijianzeng commented 2 years ago

@BSchilperoort Thank you very much, Bart. This is indeed the problem of STEMMUS. Your suggestions are very timing and we should work out the replacements of the model structure as Bob pointed out.

In an independent repository, i am currently working for incorporating STEMMUS into a data assimilation framework (EnKF). And i encountered precisely the problem as you mentioned: it is very hard to trace how variables are changing, and it is very easy to make mistakes that some of the variables should be global and should not change locally, but due to the issue you explained above, they are dynamically changing ...

And i think the 'struct' as you suggest is a perfect option to do this.

SarahAlidoost commented 10 months ago

As version 1.4, there are no global variables. It would be helpful if this is added to the contributing doc and pull request template.

BSchilperoort commented 10 months ago

As version 1.4, there are no global variables. It would be helpful if this is added to the contributing doc and pull request template.

Perhaps it's even better to add it to the linter to prevent PR merges with global variables