PEtab-dev / PEtab

PEtab - an SBML and TSV based data format for parameter estimation problems in systems biology
https://petab.readthedocs.io
MIT License
59 stars 12 forks source link

Long formats for conditions and experiments (timecourses) #586

Open dilpath opened 3 months ago

dilpath commented 3 months ago

Follow-up to https://github.com/PEtab-dev/PEtab/issues/585 (no need to read that).

Here are specs for long formats of the conditions and experiments (timecourses) tables. Additional feedback is very welcome!

Conditions table

conditionId inputId inputType inputValue
PETAB_ID NON_ESTIMATED_ENTITY_ID constant OR initial OR ... PETAB_MATH
e.g.
cond1 rate1 constant 1
cond2 species1 initial species1 + 5

Row and column ordering are arbitrary, although using the above column ordering may improve human readability.

Additional columns are allowed, for example, to specify a human-friendly name for the condition.

Other optional columns we could officially support include conditionName, but this might mean duplicated the same condition name to all rows with that condition ID...

Detailed field description

Experiments table

experimentId time conditionId
PETAB_ID NUMERIC OR -inf conditionId
e.g.
timecourse1 -inf cond1
timecourse1 0 cond2

Row and column ordering are arbitrary, although using the above column ordering may improve human readability.

Additional columns are allowed, for example, to specify a human-friendly name for the experiment.

Detailed field description

Measurements table

Only the required or changed columns are included here (other optional columns, e.g. noiseFormula, are still supported by irrelevant to this discussion).

observableId [experimentId] time measurement
observableId [experimentId] NUMERIC OR inf NUMERIC
e.g.
obs1 experiment1 5 2

Detailed field description observableId and measurement are unchanged.

Example

Conditions table conditionId inputId inputValue inputType units
cond1 rate1 0 constant mg/s
cond1 rate2 1 constant m/s
cond2 species1 0 initial mol
preeq_cond1 rate1 1 constant g/s
switch_on switch 1 constant dimensionless
switch_off switch 0 constant dimensionless
Experiments table experimentId time conditionId
timecourse1 -inf preeq_cond1
timecourse1 0 cond1
timecourse1 10 cond2
experiment1 -5 cond1
experiment1 -5 cond2
switch_sequence 0 switch_on
switch_sequence 1 switch_off
switch_sequence 2 switch_on
switch_sequence 3 switch_off
switch_sequence 4 switch_on
switch_sequence 5 switch_off

timecourse1 has a PEtab v1 preequilibrationConditionId (preeq_cond1), a PEtab v1 simulationConditionId (cond1), and then a 3rd timecourse period at t=10 with condition cond2.

experiment1 is not a timecourse, rather a single-condition simulation starting at t=-5 where two conditions are applied simultaneously.

switch_sequence is a repeating timecourse, equivalent to a nested timecourse (see https://github.com/PEtab-dev/PEtab/issues/585).

Open points

  1. There is currently some undefined behavior in the conditions table -- do we clarify that now or in a future PEtab v2.1 when the use cases are clearer? For example, what happens when a user specifies a parameter in the conditions table with inputType=constant, but then an SBML event affects the same parameter? We could simply disallow this for now.
  2. How are simultaneous conditions handled (e.g. experiment1 in the example). We could decide that they are only allowed if they change different entities. Otherwise we would need to care about some ordering.
  3. I'm happy to change naming, e.g. inputId->targetId, or experimentId->timecourseId. Let me know what you prefer. experimentId was chosen because most users won't care about timecourses, but then would still need to use that table for their single-condition "timecourses".
paulflang commented 3 months ago

Thanks a lot Dilan for accommodation the suggestions from #585 .πŸ™ Looks good to me for the most part, but I have two questions

rate/assignment/relativeRate/relativeAssignment These are currently not supported, until a tool implements them. However, they are reserved to mean changes equivalent to setting a new SBML rateRule or assignmentRule for the entity. relative indicates relative changes to pre-existing rates/reactions or assignments.

Can you give examples for the use of these? Cause to me, only the rate makes intuitive sense. I.e. rate means that the model is really perturbed. assignment would just mean that the current way of calculating the value of an SBML species/parameter/compartment is overridden with an assignment rule, right? But that would violate the SBML specs, which state "an assignment rule cannot be defined for a species that is created or destroyed in a reaction unless that species is defined as a boundary condition in the model." Not that PEtab has to stick to the SBML specs here, but still.

inputValue [PETAB_MATH, REQUIRED] The value that will be used to change the entity inputId. If a PEtab math expression involves time-dependent entities, then they represent their values at the simulation time when the condition is active, as defined in the experiments table.

Do you mean "is activated" instead of "is active"?

dilpath commented 3 months ago

Thanks a lot Dilan for accommodation the suggestions from #585 .πŸ™ Looks good to me for the most part, but I have two questions

Sure! Thanks for the feedback.

rate/assignment/relativeRate/relativeAssignment These are currently not supported, until a tool implements them. However, they are reserved to mean changes equivalent to setting a new SBML rateRule or assignmentRule for the entity. relative indicates relative changes to pre-existing rates/reactions or assignments.

Can you give examples for the use of these? Cause to me, only the rate makes intuitive sense. I.e. rate means that the model is really perturbed.

Given an inputId=species1 and inputValue=5 and original rate rule for species1: d(species1)/dt = 2*species1

Given an original assignment rule for species1: species1(t) = 2*t + k1

I was trying to capture all possibilities, include the "relative" and "isDelta" changes discussed in https://github.com/PEtab-dev/PEtab/issues/564 and https://github.com/PEtab-dev/PEtab/issues/585, and the "bolus" vs. "infusion" that you implemented in PumasQSP [1] via the duration column in that dosing table.

assignment would just mean that the current way of calculating the value of an SBML species/parameter/compartment is overridden with an assignment rule, right? But that would violate the SBML specs, which state "an assignment rule cannot be defined for a species that is created or destroyed in a reaction unless that species is defined as a boundary condition in the model." Not that PEtab has to stick to the SBML specs here, but still.

I agree, I'm not sure how to resolve this best. This is one reason why I limited constant to things that are already "constant" in the model, like parameters, and initial to things that are specified by time-derivative information. Similarly, I would limit rate/(relative)Assignment to things that are already defined by rate/assignment rules in the model. However, I think relativeRate can be interpreted as a new reaction for a species, so could apply to species defined by either reactions or rate rules. I clarified this with a bold edit in the first message now.

inputValue [PETAB_MATH, REQUIRED] The value that will be used to change the entity inputId. If a PEtab math expression involves time-dependent entities, then they represent their values at the simulation time when the condition is active, as defined in the experiments table.

Do you mean "is activated" instead of "is active"?

For the inputTypes constant and initial, this is equivalent to "is activated". But for e.g. rate, then "is active" is more accurate, since it will be evaluated "continuously" during the timecourse period with this condition. I can see this is confusing though... I clarified it with a bold edit in the first message now.

[1] https://help.juliahub.com/pumasqsp/stable/tutorials/petabimport_tutorial/#Detailed-field-description

dweindl commented 2 months ago
  • inputType [constant OR initial OR ..., REQUIRED] How the value inputValue changes the entity inputId.

    • constant The entity inputId is fixed to the value inputValue. The entity must be static in time while the condition is active, e.g. a model parameter.
    • initial The entity inputId is initialized to the value inputValue. The entity must be dynamic and defined in terms of time-derivative information, e.g. a model species involved in some reaction or specified by an ordinary differential equation.

If constant is only allowed for entities that are already constant, this could as well be replaced by initial, right? This would be coherent with initialAssignments in SBML.

  • rate/assignment/relativeRate/relativeAssignment These are currently not supported

Then I'd leave them out for now.

  • inputValue [PETAB_MATH, REQUIRED] The value that will be used to change the entity inputId. If a PEtab math expression involves time-dependent entities, then they represent their values at the simulation time when the condition is activated (edit: or active, for time-varying inputTypes like rate), as defined in the experiments table.

Related to previous discussions, we could introduce a priority column or something, which would potentially make simultaneous compartment size and concentration changes more intuitive. The interpretation of the entity symbol would be different then.

  • There is currently some undefined behavior in the conditions table -- do we clarify that now or in a future PEtab v2.1 when the use cases are clearer? For example, what happens when a user specifies a parameter in the conditions table with inputType=constant, but then an SBML event affects the same parameter? We could simply disallow this for now.

This could be clarified by replacing constant by initial, unless one really wants to disable events affecting a certain entity. Do we need that?

  • How are simultaneous conditions handled (e.g. experiment1 in the example). We could decide that they are only allowed if they change different entities. Otherwise we would need to care about some ordering.

I'd go for specifying some priority as suggested for the conditions table.

  • I'm happy to change naming, e.g. inputId->targetId, or experimentId->timecourseId. Let me know what you prefer. experimentId was chosen because most users won't care about timecourses, but then would still need to use that table for their single-condition "timecourses".

experimentId is good. Slight preference for targetId, targetValue over input*.