cellml / libcellml

Repository for libCellML development.
https://libcellml.org
Apache License 2.0
17 stars 21 forks source link

How to split constants and variables? #1242

Open chrispbradley opened 4 months ago

chrispbradley commented 4 months ago

I'm wondering if there is a way to split up the constants and variables so that there are, say for constants, two arrays passed into the computation RHS routines?

The use case is for solving a multidimensional problem e.g., monodomain equation, where you conceptually have many individual CellML models in some spatial arrangement. In this case you may want to have a spatial gradient of the values of some CellML variables e.g., you may want gNa to vary for an electrical CellML model. For this simulation you want CellML to generate the compute rates RHS code and then pass in the value of the constants to evaluate the states and rates etc. Now, depending on the particular CellML model chosen you may have a number of CellML variables that are constant, say 20. With the current RHS routine signature of a single constants array I have two options. The first is to store all 20 constant variables for each CellML model and pass in a per cell model constants array. If I have 2,000,000 cells then that is 20 x 2,000,000 values even though 19 x 2,000,000 are not really changing. This is not really optimal in a computational sense. I could just store a single array of 20 numbers and then another array of the 1 x 2,000,000 values of gNa that do change but then I would have to make 2,000,000 copies from the value of gNa that changes to the correct position of the 20 element constants array and pass that in. I would have to make these copies every time step and this approach would limit any parallelism that could otherwise be exploited. This is also not optimal computationally.

In the old CellML api a distinction could be made between the constant values that didn't really vary throughout the simulation and those that did. The values that did not change at all where hard defined in the generated code and only those constants that truly could change (parameters) were passed in.

Is there a way to split the constants array? Even if the values that did not really change where still passed in rather than hard coded in the routine then I could use libCellML for my use case? However, without being able to pass in two arrays for the constants at the very least I'm not sure how to achieve this?

There is a similar likewise case for the output variables. Those variables which had a value derived after the states had changed could be set so that those of which a modeller cared output where separated and passed out and those of which a modeller did not care about where not even computed.

agarny commented 4 months ago

I'm wondering if there is a way to split up the constants and variables so that there are, say for constants, two arrays passed into the computation RHS routines?

At the moment, a variable (in the general sense) can be a variable of integration, a state, a constant, a computed constant, an algebraic variable, or an external variable. However, when we generate code, we don't indeed currenly distinguish between constants, computed constants, and algebraic variables.

The use case is for solving a multidimensional problem e.g., monodomain equation, where you conceptually have many individual CellML models in some spatial arrangement. In this case you may want to have a spatial gradient of the values of some CellML variables e.g., you may want gNa to vary for an electrical CellML model. For this simulation you want CellML to generate the compute rates RHS code and then pass in the value of the constants to evaluate the states and rates etc. Now, depending on the particular CellML model chosen you may have a number of CellML variables that are constant, say 20. With the current RHS routine signature of a single constants array I have two options. The first is to store all 20 constant variables for each CellML model and pass in a per cell model constants array. If I have 2,000,000 cells then that is 20 x 2,000,000 values even though 19 x 2,000,000 are not really changing. This is not really optimal in a computational sense. I could just store a single array of 20 numbers and then another array of the 1 x 2,000,000 values of gNa that do change but then I would have to make 2,000,000 copies from the value of gNa that changes to the correct position of the 20 element constants array and pass that in. I would have to make these copies every time step and this approach would limit any parallelism that could otherwise be exploited. This is also not optimal computationally.

Agreed, none of these two options is good.

In the old CellML api a distinction could be made between the constant values that didn't really vary throughout the simulation and those that did. The values that did not change at all where hard defined in the generated code and only those constants that truly could change (parameters) were passed in.

Is there a way to split the constants array? Even if the values that did not really change where still passed in rather than hard coded in the routine then I could use libCellML for my use case? However, without being able to pass in two arrays for the constants at the very least I'm not sure how to achieve this?

We could modify the generated code to distinguish between constants, computed constants, and algebraic variables, if that's what you mean. This would change the generated quite a bit, but I can appreciate your use case.

This means that for the HH52 model for instance, we wouldn't generate this implementation C code anymore, but something like the code below (note that for our XXX_INFO arrays there would be no need to specify the type of a given variable). Is this what you are after?

/* The content of this file was generated using the C profile of libCellML 0.5.0. */

#include "model.h"

#include <math.h>
#include <stdlib.h>

const char VERSION[] = "0.5.0";
const char LIBCELLML_VERSION[] = "0.5.0";

const size_t STATE_COUNT = 4;
const size_t CONSTANT_COUNT = 5;
const size_t COMPUTED_CONSTANT_COUNT = 3;
const size_t ALGEBRAIC_COUNT = 10;

const VariableInfo VOI_INFO = {"time", "millisecond", "environment", VARIABLE_OF_INTEGRATION};

const VariableInfo STATE_INFO[] = {
    {"V", "millivolt", "membrane"},
    {"h", "dimensionless", "sodium_channel_h_gate"},
    {"m", "dimensionless", "sodium_channel_m_gate"},
    {"n", "dimensionless", "potassium_channel_n_gate"}
};

const VariableInfo CONSTANT_INFO[] = {
    {"Cm", "microF_per_cm2", "membrane", CONSTANT},
    {"E_R", "millivolt", "membrane", CONSTANT},
    {"g_L", "milliS_per_cm2", "leakage_current", CONSTANT},
    {"g_Na", "milliS_per_cm2", "sodium_channel", CONSTANT},
    {"g_K", "milliS_per_cm2", "potassium_channel", CONSTANT},
};

const VariableInfo COMPUTED_CONSTANT_INFO[] = {
    {"E_L", "millivolt", "leakage_current", COMPUTED_CONSTANT},
    {"E_Na", "millivolt", "sodium_channel", COMPUTED_CONSTANT},
    {"E_K", "millivolt", "potassium_channel", COMPUTED_CONSTANT},
};

const VariableInfo ALGEBRAIC_INFO[] = {
    {"i_Stim", "microA_per_cm2", "membrane", ALGEBRAIC},
    {"i_L", "microA_per_cm2", "leakage_current", ALGEBRAIC},
    {"i_K", "microA_per_cm2", "potassium_channel", ALGEBRAIC},
    {"i_Na", "microA_per_cm2", "sodium_channel", ALGEBRAIC},
    {"alpha_m", "per_millisecond", "sodium_channel_m_gate", ALGEBRAIC},
    {"beta_m", "per_millisecond", "sodium_channel_m_gate", ALGEBRAIC},
    {"alpha_h", "per_millisecond", "sodium_channel_h_gate", ALGEBRAIC},
    {"beta_h", "per_millisecond", "sodium_channel_h_gate", ALGEBRAIC},
    {"alpha_n", "per_millisecond", "potassium_channel_n_gate", ALGEBRAIC},
    {"beta_n", "per_millisecond", "potassium_channel_n_gate", ALGEBRAIC}
};

double * createStatesArray()
{
    double *res = (double *) malloc(STATE_COUNT*sizeof(double));

    for (size_t i = 0; i < STATE_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

double * createConstantsArray()
{
    double *res = (double *) malloc(CONSTANT_COUNT*sizeof(double));

    for (size_t i = 0; i < CONSTANT_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

double * createComputedConstantssArray()
{
    double *res = (double *) malloc(COMPUTED_CONSTANT_COUNT*sizeof(double));

    for (size_t i = 0; i < COMPUTED_CONSTANT_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

double * createAlgebraicArray()
{
    double *res = (double *) malloc(ALGEBRAIC_COUNT*sizeof(double));

    for (size_t i = 0; i < ALGEBRAIC_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

void deleteArray(double *array)
{
    free(array);
}

void initialiseVariables(double *states, double *rates, double *constants)
{
    states[0] = 0.0;
    states[1] = 0.6;
    states[2] = 0.05;
    states[3] = 0.325;
    constants[0] = 1.0;
    constants[1] = 0.0;
    constants[2] = 0.3;
    constants[3] = 120.0;
    constants[4] = 36.0;
}

void computeComputedConstants(double *computedConstants)
{
    computedConstants[0] = constants[1]-10.613;
    computedConstants[1] = constants[1]-115.0;
    computedConstants[2] = constants[1]+12.0;
}

void computeRates(double voi, double *states, double *rates, double *constants, double *computedConstants, double *algebraic)
{
    algebraic[0] = ((voi >= 10.0) && (voi <= 10.5))?-20.0:0.0;
    algebraic[1] = constants[2]*(states[0]-computedConstants[0]);
    algebraic[2] = constants[4]*pow(states[3], 4.0)*(states[0]-computedConstants[2]);
    algebraic[3] = constants[3]*pow(states[2], 3.0)*states[1]*(states[0]-computedConstants[1]);
    rates[0] = -(-algebraic[0]+algebraic[3]+algebraic[2]+algebraic[1])/constants[0];
    algebraic[4] = 0.1*(states[0]+25.0)/(exp((states[0]+25.0)/10.0)-1.0);
    algebraic[5] = 4.0*exp(states[0]/18.0);
    rates[2] = algebraic[4]*(1.0-states[2])-algebraic[5]*states[2];
    algebraic[6] = 0.07*exp(states[0]/20.0);
    algebraic[7] = 1.0/(exp((states[0]+30.0)/10.0)+1.0);
    rates[1] = algebraic[5]*(1.0-states[1])-algebraic[7]*states[1];
    algebraic[8] = 0.01*(states[0]+10.0)/(exp((states[0]+10.0)/10.0)-1.0);
    algebraic[9] = 0.125*exp(states[0]/80.0);
    rates[3] = algebraic[8]*(1.0-states[3])-algebraic[9]*states[3];
}

void computeVariables(double voi, double *states, double *rates, double *constants, double *computedConstants, double *algebraic)
{
    algebraic[1] = constants[2]*(states[0]-computedConstants[0]);
    algebraic[3] = constants[3]*pow(states[2], 3.0)*states[1]*(states[0]-computedConstants[1]);
    algebraic[4] = 0.1*(states[0]+25.0)/(exp((states[0]+25.0)/10.0)-1.0);
    algebraic[5] = 4.0*exp(states[0]/18.0);
    algebraic[5] = 0.07*exp(states[0]/20.0);
    algebraic[7] = 1.0/(exp((states[0]+30.0)/10.0)+1.0);
    algebraic[2] = constants[4]*pow(states[3], 4.0)*(states[0]-computedConstants[2]);
    algebraic[8] = 0.01*(states[0]+10.0)/(exp((states[0]+10.0)/10.0)-1.0);
    algebraic[9] = 0.125*exp(states[0]/80.0);
}

There is a similar likewise case for the output variables. Those variables which had a value derived after the states had changed could be set so that those of which a modeller cared output where separated and passed out and those of which a modeller did not care about where not even computed.

Is this something that the CellML API allows you to do or is this something that you guys currently filter out from the code generated by the CellML API?

Either way, this is not something that is currently supported by libCellML's code generator, but in the same way that we can specifcy external variables (i.e. variables that are to be computed outside of the model rather than by the model itself; e.g., to do an action potential clamp), we could have a method to let the generator that we don't care about a given variable and that its computation can be ignored (unless it's needed to compute another part of the model). This is a different issue though, so you may want to create a specific GitHub issue for it.

chrispbradley commented 4 months ago

Hi Alan,

I should also reference https://github.com/cellml/libcellml/issues/1013 which is this same thing.

At the moment, a variable (in the general sense) can be a variable of integration, a state, a constant, a computed constant, an algebraic variable, or an external variable. However, when we generate code, we don't indeed currenly distinguish between constants, computed constants, and algebraic variables.

Agreed, none of these two options is good.

We could modify the generated code to distinguish between constants, computed constants, and algebraic variables, if that's what you mean. This would change the generated quite a bit, but I can appreciate your use case.

Not quite. If I understand what you are saying there are constant (straight assignment of a value), computed constants (involve computations involving just constants), algebraic (involve computations involving state and constants), etc. The current libCellML API is drawing a distinction based on what variables are on the LHS. However, I think there also needs to be a distinction based on input/output (or public/private if you prefer although these terms may be overloaded here). For example in the use case above I want to vary gNa spatially. This, if I understand it, would be a constant variable i.e., a straight assignment. I would need to be able to flag it compared to all the other straight constants so that it could be passed separately to the other constant values.

It would also be like if I wrapped the current libCellML generated code in another routine that then called the current code routines then these flagged variables would be public in the wrapped routine but the non-flagged would be private. Then, provide these public and private variables could be passed separately to the current libCellML generated code, then I would just pass in my flagged constants into the wrapped code, and then pick up the unflagged variables and pass them both into the current generated code. I'll try and illustrate below.

For the "output" variables like computed constants or algebraic then, again, a similar form of flagging could be useful. I'm not 100% sure if the current CellML API just doesn't compute non-flagged variables but if it doesn't it would be useful if it didn't. If, say, I only wanted to look at INa (and didn't care about IK, IL all the alpha and betas etc. below that aren't also state and are so required because they affect other variables) then if all the algebraics are computed and returned then a) it is a waste of flops because nobody wants the value and b) they just have to be stored somewhere and so you have potentially millions of wasted stored numbers.

For example something like

/* The content of this file was generated using the C profile of libCellML 0.5.0. */

#include "model.h"

#include <math.h>
#include <stdlib.h>

const char VERSION[] = "0.5.0";
const char LIBCELLML_VERSION[] = "0.5.0";

const size_t STATE_COUNT = 4;
const size_t PUBLIC_CONSTANT_COUNT = 2;
const size_t PRIVATE_CONSTANT_COUNT = 3;
const size_t PUBLIC_COMPUTED_CONSTANT_COUNT = 0;
const size_t PRIVATE_COMPUTED_CONSTANT_COUNT = 3;
const size_t PUBLIC_ALGEBRAIC_COUNT =2;
const size_t PRIVATE_ALGEBRAIC_COUNT = 8;

const VariableInfo VOI_INFO = {"time", "millisecond", "environment", VARIABLE_OF_INTEGRATION};

/* STATE is always PUBLIC??? */

const VariableInfo STATE_INFO[] = {
    {"V", "millivolt", "membrane"},
    {"h", "dimensionless", "sodium_channel_h_gate", PUBLIC},
    {"m", "dimensionless", "sodium_channel_m_gate", PUBLIC},
    {"n", "dimensionless", "potassium_channel_n_gate", PUBLIC}
};

const VariableInfo CONSTANT_INFO[] = {
    {"Cm", "microF_per_cm2", "membrane", CONSTANT, PUBLIC},
    {"E_R", "millivolt", "membrane", CONSTANT, PRIVATE},
    {"g_L", "milliS_per_cm2", "leakage_current", CONSTANT, PRIVATE},
    {"g_Na", "milliS_per_cm2", "sodium_channel", CONSTANT, PUBLIC},
    {"g_K", "milliS_per_cm2", "potassium_channel", CONSTANT, PRIVATE},
};

const VariableInfo COMPUTED_CONSTANT_INFO[] = {
    {"E_L", "millivolt", "leakage_current", COMPUTED_CONSTANT, PRIVATE},
    {"E_Na", "millivolt", "sodium_channel", COMPUTED_CONSTANT, PRIVATE},
    {"E_K", "millivolt", "potassium_channel", COMPUTED_CONSTANT, PRIVATE},
};

const VariableInfo ALGEBRAIC_INFO[] = {
    {"i_Stim", "microA_per_cm2", "membrane", ALGEBRAIC, PUBLIC}, /* Don't know why this wouldn't be a constant???? */
    {"i_L", "microA_per_cm2", "leakage_current", ALGEBRAIC, PRIVATE},
    {"i_K", "microA_per_cm2", "potassium_channel", ALGEBRAIC, PRIVATE},
    {"i_Na", "microA_per_cm2", "sodium_channel", ALGEBRAIC, PUBLIC},
    {"alpha_m", "per_millisecond", "sodium_channel_m_gate", ALGEBRAIC, PRIVATE},
    {"beta_m", "per_millisecond", "sodium_channel_m_gate", ALGEBRAIC, PRIVATE},
    {"alpha_h", "per_millisecond", "sodium_channel_h_gate", ALGEBRAIC, PRIVATE},
    {"beta_h", "per_millisecond", "sodium_channel_h_gate", ALGEBRAIC, PRIVATE},
    {"alpha_n", "per_millisecond", "potassium_channel_n_gate", ALGEBRAIC, PRIVATE},
    {"beta_n", "per_millisecond", "potassium_channel_n_gate", ALGEBRAIC, PRIVATE}
};

double * createStatesArray()
{
    double *res = (double *) malloc(STATE_COUNT*sizeof(double));

    for (size_t i = 0; i < STATE_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

/* Just listing Public here - Have some mechanism for Private? */

double * createPublicConstantsArray()
{
    double *res = (double *) malloc(PUBLIC_CONSTANT_COUNT*sizeof(double));

    for (size_t i = 0; i < PUBLIC_CONSTANT_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

double * createPublicComputedConstantssArray()
{
    double *res = (double *) malloc(PUBLIC_COMPUTED_CONSTANT_COUNT*sizeof(double));

    for (size_t i = 0; i < PUBLIC_COMPUTED_CONSTANT_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

double * createPublicAlgebraicArray()
{
    double *res = (double *) malloc(PUBLIC_ALGEBRAIC_COUNT*sizeof(double));

    for (size_t i = 0; i < PUBLIC_ALGEBRAIC_COUNT; ++i) {
        res[i] = NAN;
    }

    return res;
}

void deleteArray(double *array)
{
    free(array);
}

void initialisePublicVariables(double *states, double *rates, double *publicConstants)
{
    states[0] = 0.0;
    states[1] = 0.6;
    states[2] = 0.05;
    states[3] = 0.325;
    publicConstants[0] = 1.0;
    publicConstants[1] = 120.0;
}

void initialisePrivateVariables(double *privateConstants)
{
    privateConstants[0] = 0.0;
    privateConstants[1] = 0.3;
    privateConstants[2] = 36.0;
}

void computeComputedConstants(double *publicConstants, double *privateConstants, double *computedConstants)
{
    /* Only computed constants that are used in other variables or are public are listed???? */

    computedConstants[0] = privateConstants[0]-10.613;
    computedConstants[1] = privateConstants[0]-115.0;
    computedConstants[2] = privateConstants[0]+12.0;
}

/* Can have public/private computed constants and algebraic??? */
/* Only public computed constants and algebraic are listed unless they affect another variable??? */

void computeRates(double voi, double *states, double *rates, double *publicConstants, double *privateConstants, double *computedConstants, double *algebraic)
{
    algebraic[0] = ((voi >= 10.0) && (voi <= 10.5))?-20.0:0.0;
    algebraic[1] = privateConstants[1]*(states[0]-computedConstants[0]);
    algebraic[2] = privateConstants[2]*pow(states[3], 4.0)*(states[0]-computedConstants[2]);
    algebraic[3] = publicConstants[0]*pow(states[2], 3.0)*states[1]*(states[0]-computedConstants[1]);
    rates[0] = -(-algebraic[0]+algebraic[3]+algebraic[2]+algebraic[1])/publicConstants[0];
    algebraic[4] = 0.1*(states[0]+25.0)/(exp((states[0]+25.0)/10.0)-1.0);
    algebraic[5] = 4.0*exp(states[0]/18.0);
    rates[2] = algebraic[4]*(1.0-states[2])-algebraic[5]*states[2];
    algebraic[6] = 0.07*exp(states[0]/20.0);
    algebraic[7] = 1.0/(exp((states[0]+30.0)/10.0)+1.0);
    rates[1] = algebraic[5]*(1.0-states[1])-algebraic[7]*states[1];
    algebraic[8] = 0.01*(states[0]+10.0)/(exp((states[0]+10.0)/10.0)-1.0);
    algebraic[9] = 0.125*exp(states[0]/80.0);
    rates[3] = algebraic[8]*(1.0-states[3])-algebraic[9]*states[3];
}

/* Only have the public algebraic etc. ??? */

void computeVariables(double voi, double *states, double *rates, double *publicConstants, double *privateConstants,  double *computedConstants, double *algebraic)
{
    algebraic[1] = privateConstants[1]*(states[0]-computedConstants[0]);
    algebraic[3] = publicConstants[1]*pow(states[2], 3.0)*states[1]*(states[0]-computedConstants[1]);
    algebraic[4] = 0.1*(states[0]+25.0)/(exp((states[0]+25.0)/10.0)-1.0);
    algebraic[5] = 4.0*exp(states[0]/18.0);
    algebraic[5] = 0.07*exp(states[0]/20.0);
    algebraic[7] = 1.0/(exp((states[0]+30.0)/10.0)+1.0);
    algebraic[2] = privateConstants[2]*pow(states[3], 4.0)*(states[0]-computedConstants[2]);
    algebraic[8] = 0.01*(states[0]+10.0)/(exp((states[0]+10.0)/10.0)-1.0);
    algebraic[9] = 0.125*exp(states[0]/80.0);
}

void wrappedComputeRatesAndVariables(double voi, double *states, double *rates, double *publicConstants, double *publicComputedConstants, double *publicAlgebraic)
{
double *privateConstants; /* from somewhere? */
/* or something like 
static double privateConstants[PRIVATE_CONSTANT_COUNT];
privateConstants[0] = 0.0;
privateConstants[1] = 0.3;
privateConstants[2] = 36.0;
*/

computeComputedConstants(publicConstants, privateConstants, computedConstants)
computeRates(voi, states, rates, publicConstants, privateConstants, publicComputedConstants, publicAlgebraic)
computeVariables(voi, states, rates, publicConstants, privateConstants,  publicComputedConstants, publicAlgebraic)
}

Is this something that the CellML API allows you to do or is this something that you guys currently filter out from the code generated by the CellML API?

Either way, this is not something that is currently supported by libCellML's code generator, but in the same way that we can specifcy external variables (i.e. variables that are to be computed outside of the model rather than by the model itself; e.g., to do an action potential clamp), we could have a method to let the generator that we don't care about a given variable and that its computation can be ignored (unless it's needed to compute another part of the model). This is a different issue though, so you may want to create a specific GitHub issue for it.

Speaking to Andre, I'm not sure external is what is required??? Is it a different issue? Not sure if this the wrapped routine is a different generator profile? or a different generator mode? Mode I only generates the standard routines with public/private, mode II generates both the wrapped and unwrapped????

chrispbradley commented 4 months ago

Forgot to say, that variables could all be defaulted to PUBLIC unless flagged as PRIVATE and then you end up with sort of the same thing as now???

agarny commented 4 months ago

I think we will need to discuss it among the libCellML team. The original plan was to make things as simple as possible when it comes to code generation, but here I feel like this is becoming rather complicated.

Maybe OpenCMISS should generate its own code using libCellML? I mean, as part of issue #1218, I had to rework the code generator so that it could not only be used to generate text-based code, but also to generate some kind of in-memory code (that can then be interpreted).

So, maybe that reworked code generator should expose some of its internals in the API so that a library like OpenCMISS could make use of it to generate some bespoke code?

chrispbradley commented 4 months ago

Please, by all means, discuss it with the libCellML team. However, I'm not so sure it is that complicated?

Clearly, CellML variables have attributes. There are attributes like units, and you already have the type in which you work out the relationship between variables and equations. The current set of attributes, however, are more for the concept of a CellML model evaluation with a single set of variable values. As soon as you bring in the idea of the evaluation of a CellML model with two (or more, or many-many more) sets of variable values you will find that only these attributes are not so useful. This is not just an OpenCMISS issue but rather one applies to all applications of similar ilk and a use case I would have thought is important?

What I'm saying is that it would be useful to have another variable attribute. I appreciate that PUBLIC/PRIVATE or INTERNAL/EXTERNAL might be a bit overused and have multiple meanings. Maybe EXPOSED? All variables are EXPOSED (or maybe it makes more sense for the attribute to be unexposed which is False by default - up to you) unless explicitly flagged otherwise? Then when the code is generated only EXPOSED variables are in the routine interfaces or have their equation generated (unless they are required by other variables of course - maybe any error to flag as not exposed such variables?). The unexposed constant variables can be handled in the same way that you have for your variable COUNT variables i.e., const variable = value (or a #define etc.) at the top of the generated file (and thus let the compiler or interpreter deal with them in the best way it can think of). This way your routine signatures and setup are exactly as they are now (although the COUNTS will need to be for just the EXPOSED variables) as the default is EXPOSED. The important thing is that there is way that just the EXPOSED constants can be passed in (up to you about whether the UNEXPOSED ones are passed in as well). If not one is forced to store all sets of values for the multiple CellML models which is not very ideal as we all agree.

It could be that one way to do is to expose the generator or have the application re-write the code string. However, this is quite a complicated solution. As I mentioned above this isn't just an OpenCMISS use case but rather a more generic one for all multi-cell model applications. If the solution is as complicated as diving into the generation internals then there will be a real risk that the audience for libCellML will be somewhat limited?

agarny commented 4 months ago

Nothing is complicated per se. I just want to make sure that I understand everything. What would help is if you were to provide, for a given CellML file, the code that gets currently generated and the code that you would ideally like to see generated based what you want see exposed / hidden / etc.

In the meantime, I am going to modify the code generator so that it uses different arrays for constants, computed constants, and algebraic variables, i.e. generate the kind of code I mentioned above (see issue #1243). From there, we can refine things even further, no problems.

chrispbradley commented 4 months ago

Nothing is complicated per se. I just want to make sure that I understand everything. What would help is if you were to provide, for a given CellML file, the code that gets currently generated and the code that you would ideally like to see generated based what you want see exposed / hidden / etc.

OK, I'll create a mock up.

In the meantime, I am going to modify the code generator so that it uses different arrays for constants, computed constants, and algebraic variables, i.e. generate the kind of code I mentioned above (see issue #1243). From there, we can refine things even further, no problems.

Is there another requirement for computed_constants being separated? Having them separate will, I think, not help for my case. I guess they are cleaner in terms of intent in and out being separate?

agarny commented 4 months ago

Is there another requirement for computed_constants being separated? Having them separate will, I think, not help for my case. I guess they are cleaner in terms of intent in and out being separate?

Yes, it's to make things cleaner, more consistent.

chrispbradley commented 4 months ago

OK, here is my mockup. For this I would run a multi-cellular monodomain with the above HH model. In this case I would want to investigate Na channel conductance so I would

To achieve this I would make the following type calls to the libCellML API

... calls to load the model etc.

... set variable exposure with something like (note: everything is exposed by default)

ERVariableObj.SetVariableExposure(NOT_EXPOSED)

or obj.SetVariableExposure('E_R',NOT_EXPOSED)

etc. for everything except C_m, g_Na, I_Na.

... call to generate code. It would generate something like

HH.c.txt

and

model.h.txt

Any thoughts or comments?

agarny commented 4 months ago

Thanks @chrispbradley, I am currently working on issue #1243 (which, although simple on the surface, requires quite a few changes here and there) when I have a bit of time (I am leaving for the ISAN conference early next week, so this is my priority at this stage) and will then focus on refining the code generation to accommodate your needs.

Stil, a few preliminary comments:

As I have said, those are just some preliminary thoughts. First, I want and need to finish issue #1243. Then, we can refine things to suit your (justified) needs.

chrispbradley commented 4 months ago

Thanks @chrispbradley, I am currently working on issue #1243 (which, although simple on the surface, requires quite a few changes here and there) when I have a bit of time (I am leaving for the ISAN conference early next week, so this is my priority at this stage) and will then focus on refining the code generation to accommodate your needs.

Understand completely. I'll comment below in case whatever I say changes things in the interim and whilst I remember.

Stil, a few preliminary comments:

* In the `XXX_INFO` arrays, I wouldn't mention `EXPOSED`/`NOT_EXPOSED`. Only the variables exposed should be listed. Indeed, the index of the variables in those arrays should match that of the arrays we pass to our various methods.

Fine by me.

This is not the case for CONSTANT_INFO and constants.

Perhaps. The requirements for the data will be on whether or not those variables are required outside of the file. Unexposed/untracked variables may only be required to be seen if they need to be passed into any routine. If the routine signatures do not require them then they will not be needed?

* Rather than "exposed" / "not exposed", I would probably have methods to `trackAllVariables()`, `untrackAllVariables()`, `trackVariable(xxx)`, and `untrackVariable(xxx)`, so that the user has full control over what to track or not.

Fine by me. Exposed was just a suggestion. Tracked is fine.

* You can't have `i_Stim` as a constant since most of the time it's equal to 0 uA/cm2 and to -20 uA/cm2 during a stimulus.

Wouldn't this be determined by the actual CellML code? If the CellML code just referenced I_Stim in a I_Stim = xyz fashion then wouldn't it be flagged as a constant? For bioelectric simulations like bi/mono-domain the stimulus current also appears in the PDE and so its value is determined outside of CellML. Taking out the stim on/off conditional expression is pretty much the first thing I have to do to make CellML models useful for bioelectric sims (that and not normalising everything by Am, Cm internally). For my cases I_stim would be a constant.

* The `createXxxArray()` and `deleteArray()` methods will definitely be kept. If you don't want/need them, you should modify the default `C` profile so that they don't get generated. The same holds true for anything that is generated by the generator.

Sure, no problem them being there. I just wouldn't call them as I need to manage the memory.

* There will not be a combined compute rates and variables routine. `computeRates()` is called by your ODE solver and it can be called many times for a given time integration while `computeVariables()` is only to be called once a given time integration is done (so we can compute variables that depend on the new value of a state and/or rate).

My poor choice of name. The issue is that of the computed Constants. The algebraic variables are already computed inside the computeRates routine. The computeRatesAndVariables was mean to compute everything required. It could be called computeRatesAndComputedConstants. The reason why I wish for a single routine is because I don't want to call the computeComputedConstants routine. Actually, calling it is not necessarily the problem. The problem is where the computed constants are allocated. For the computed constants that are not tracked it is important that they are not on the heap. If the untracked computed constants involve a constant that is tracked then its value will (potentially) change when the value of the dependent constant changes. Thus the expression calculating the computed constants needs to evaluated before the rates. If the untracked computed constant is then passed into the rates routine then I would have to store its value somewhere (even though I have untracked it because I don't care about its value) in order to allow for the rates routine to be called in parallel inside something like an OpenMP parallel loop. If I don't store it and it is allocated on the heap then each independent parallel calculate rates (or calculate computed constants) call would conflict with each other as they would all be trying to independently write to the same memory location. Having the computed constants as local variables inside the compute rates routine solves this problem as they will be allocated on the stack and thus each independent compute rates routine can write to its own independent bit of memory in parallel. I appreciate the desire to separate out the expressions so that the computed constant expressions are only called when the values change. However, if the computed constants expressions are listed locally then most optimising compilers will run a dependency analysis and work out that the computed constants expressions that involve untracked constants do not in fact change and optimise the expression away so that it is not actually computed every time the rates routine is called. Having a computeRatesAndComputedConstants routine as well as a computeRates routine allows for this to be avoided by the user if they do not want to take the small hit?

* Regarding passing in/out arrays of untracked (i.e. non-exposed) variables, we may indeed end up having to do that, but that's something that could be done only if the user was to decide not to track certain variables.  If all variables were to be tracked (default) then the signature of the various methods would remain the same as it is now.

Yes, although with a computeRatesAndComputedConstants routine as well this does not change as it only exposes the tracked variables and can avoid the parallel heap memory problem. If the computed constants are on the heap then the untracked ones will need to be passed into the rates routine and the signature of the that routine changes depending on the tracked/untracked status of the variables.

As I have said, those are just some preliminary thoughts. First, I want and need to finish issue #1243. Then, we can refine things to suit your (justified) needs.

chrispbradley commented 3 months ago

So further thoughts:

agarny commented 3 months ago

Hmm... for me (and the analyser, as it currently stands) the VOI is not a constant. So, if t in X(t) = A.sin(omega.t) is the VOI then X(t) won't be considered as a computed constant, but as an algebraic variable since t can take a range of values during a simulation.

Regarding untracked constants to be local variables, I am open to suggestions. I will see what I can come up with and, from there, you and others will be more than welcome to comment on issue #1244's PR. We have tons of tests, so those tests will make it very clear what kind of code you will be able to expect.

chrispbradley commented 3 months ago

Hi Alan, but what if the model is a purely algebraic example? There are no states, no integration and therefore no voi? This is a case of some output variables being functions of input variables. Yes, t can take a range of values but so can A and omega? How is t different to the other variables in this case? For X(A) = 2.A + 1 wouldn't A be a constant and therefore X is just a function of constants and so it is a computed constant? If that is the case how is it different from X(t) = 2.t + 1?

agarny commented 3 months ago

If there are no ODEs then there will indeed be no VOI. So, going back to your original equation (i.e. X(t) = A.sin(omega.t)) and based on what you initially said about A and omega, I would expect the analyser to conclude that:

And, because there is only one equation, then the "system" would be considered to be underconstrained. However, you can still solve X(t) for a given t. For this, you will need to flag t as an external variable. This means that rather than considering t as being unknown, the analyser would consider it as being an external variable (which value is controlled by you). Then, when it comes to code generation, it would generate something like:

variables[3] = externalVariable(variables, 3);
variables[2] = variables[0]*sin(variables[1]*variables[3]);

with:

A = variables[0]
omega = variables[1]
X(t) = variables[2]
t = variables[3]

externalVariable() is a callback function that you need to provide to "compute" t.

When it comes to X(A) = 2.A + 1, it all depends on how A was declared in your CellML model. If it has an initial_value then it will be considered as a constant (as are A and omega in the previous example) otherwise its type will be considered to be unknown (same for X(A)) since there is no equation to compute it.

chrispbradley commented 3 months ago

OK, we now concur, and thus the problem. In the case above, X(t) = A.sin(omega.t), then time would be a variable and the system equation would be computed in the computeComputedConstants routine which doesn't have a VOI passed in. The "solutions" are to either pass in time via the constants array or have time "evaluated" via an externalVariable routine. Neither or these options scale particularly well for a large number of Cell "instances". I could give the time variable an initial value but then flag it as tracked. This would mean that I can control the value of time from my external program and pass it in but then I would have to store it which would mean that I have to take a large amount of memory to store time for each cell (which is just one number). If I don't store it then I would need to use the externalVariable routine to make a call back to my program to provide the value of time which is not particularly efficient in itself.

From my point of view, I see the CellML model as a black box. Inside that black box are a number of variables that can be related through equations. I need to expose a number of those variables to outside the black box so that I can either control the variables value before evaluation of the black box or to receive the value of those variables back from the black box post evaluation. I thus need the ability to say what variables are exposed (via the tracked mechanism). Now, separate issues pop up as to whether or not those variables are state (and thus have rates) - which is handled well - and to what extent the variables are input variables or output variables. This part can be handled via CellML (e.g., give them an initial value so that they are input) and the nature of the equations. It also seems that some variables are, indeed, a bit special - like time - in that to handle them in a generic way is not very efficient.

The question then is is it possible to wrangle the API to generate code that can handle the use case of a OpenCOR type application and an OpenCMISS type application? Is some customisation required (maybe via generator profile options), e.g., flagging something like a variable like time so that it is passed in via an efficient mechanism like a routine parameter?

agarny commented 3 months ago

OK, we now concur, and thus the problem. In the case above, X(t) = A.sin(omega.t), then time would be a variable and the system equation would be computed in the computeComputedConstants routine which doesn't have a VOI passed in.

For X(t) = A.sin(omega.t) to be computed in computeComputedConstants(), you need t to be a constant, which it is not. If it was marked as an external variable (so that the equation is not considered under constrained) then it should be computed in computeVariables().

The "solutions" are to either pass in time via the constants array or have time "evaluated" via an externalVariable routine. Neither or these options scale particularly well for a large number of Cell "instances". I could give the time variable an initial value but then flag it as tracked. This would mean that I can control the value of time from my external program and pass it in but then I would have to store it which would mean that I have to take a large amount of memory to store time for each cell (which is just one number). If I don't store it then I would need to use the externalVariable routine to make a call back to my program to provide the value of time which is not particularly efficient in itself.

From my point of view, I see the CellML model as a black box. Inside that black box are a number of variables that can be related through equations. I need to expose a number of those variables to outside the black box so that I can either control the variables value before evaluation of the black box or to receive the value of those variables back from the black box post evaluation. I thus need the ability to say what variables are exposed (via the tracked mechanism). Now, separate issues pop up as to whether or not those variables are state (and thus have rates) - which is handled well - and to what extent the variables are input variables or output variables. This part can be handled via CellML (e.g., give them an initial value so that they are input) and the nature of the equations. It also seems that some variables are, indeed, a bit special - like time - in that to handle them in a generic way is not very efficient.

The question then is is it possible to wrangle the API to generate code that can handle the use case of a OpenCOR type application and an OpenCMISS type application? Is some customisation required (maybe via generator profile options), e.g., flagging something like a variable like time so that it is passed in via an efficient mechanism like a routine parameter?

This is really starting to get very complicated. I am still working on issue https://github.com/cellml/libcellml/issues/1243 and I have been working on it for the past couple of weeks (it's simple on paper, but a pain in practice) and then I need to work on issue https://github.com/cellml/libcellml/issues/1244 (as well as on some other issues with OpenCOR).

So, it would be nice to have some CellML code, as well as the kind of generated code that you would like to see, so that we can better appreciate what can be done, if anything.

Also, were you able to do all you have been discussing using the legacy CellML API before? If so, you should have all those CellML files and the generated code at hand?