MTEL-USC / AWESOME-OCIM

A Working Environment for Simulating Ocean Movement and Elemental cycling
12 stars 4 forks source link

Remove load from functions #8

Open briochemc opened 4 years ago

briochemc commented 4 years ago

@profseth and @hengdiliang

As I mentioned somewhere else, I think we should remove all the load calls from the functions, but I'm not sure what is the best approach, so we should discuss this here (if you even want to change it at all, of course)

I propose two alternatives:

  1. Carrying extra argument(s)

    In each function to hold these items. If it was all up to me, I think I'd try something like turning all the

    fun(A, b, do)

    into

    fun(A, b, BGC, OCIM, DATA)

    where

    • BGC is just my renamed do (because do is not self-explanatory) and holds the usual parameters, e.g., BGC.revscav.on, and so on.
    • OCIM holds the transport circulation and the grid information, so basically it's ao (but instead of loading it many times, we load it once and for all) and water_transport.
    • DATA holds the required external data, e.g., the Mahowald dust 2D map gridded to the OCIM. Again, having it passed as an argument means that it's not loaded every time the computation needs the dust source. DATA could also hold things like other tracers, like stuff for the cost function, etc. Basically it would hold everything else than OCIM and BGC.
  2. Use global variables.

    OCIM circulation and grid, BGC flags, and external data don't change at all while we run OCIM models. Following the 1st solution above, we could just use

    fun(A, b, p)

    everywhere, and turn DATA, OCIM, and BGC into global variables, except for a few "optimizable" parameters that would live in p. Then we could keep the load statements in each of these functions, but have the load statement only trigger if the data is not already available in DATA, OCIM, or BGC. So something like that, for example:

    if ~isfield(DATA, AEROSOLDEP)
        DATA.AEROSOLDEP = load('data/AEROSOLDEP')
    end

    I think this would prevent redundant load calls, and could save a lot of time.


FWIW, I prefer option 2 in theory, but I've been told to avoid globals, so there's that.

What do you guys think?

profseth commented 4 years ago

I just ran a model with dust and nepheloid and hydrothermal sources (so at least 3 big 'loads' plus loading ao.mat) which took 0.5 seconds to set up the problem and 9 seconds to solve the matrix inversion. So even with all of that loading it's a relatively small total proportion of the time, right?

Unless I'm missing something and this is a much bigger time-sink than I have understood, I prefer to re-load the data in each function because it makes the functions easier to understand and modify for other uses.

briochemc commented 4 years ago

You're right, I don't think it will matter for the current setup, because these are in fact rather "small" loads. But it's a good general practice to not compute or load the same things again and again! 😄

BTW, the changes I'm thinking of should be rather minimal and not obfuscate things at all. I think we could just replace the

load('data/AEROSOLDEP')

type of calls with something like

global AEROSOLDEP
isempty(AEROSOLDEP) && load('data/AEROSOLDEP')