PySCeS / pysces

The official PySCeS project source code repository.
https://pysces.github.io
Other
34 stars 10 forks source link

fix parsing of ln vs log vs log10 with libsbml #45

Closed jmrohwer closed 3 years ago

jmrohwer commented 3 years ago

Translation with the infix parser of libsbml wrongly caused ln in SBML MathML to be converted to math.log10. This updates the MathML to numpy and numpy to MathML conversion dictionaries to deal with this correctly. See #44

jmrohwer commented 3 years ago

I agree completely with what you are saying. However, the issue is inconsistencies between libsbml and MathML. And what we are seeing/interacting with in PySCeS is the output from libsbml, and when we are writing SBML files, we are not writing the MathML directly but doing this via libsbml. So our translation tables should be commensurate with what libsbml uses.

Refer to this post https://github.com/PySCeS/pysces/issues/44#issuecomment-896765686 that explains it in detail. Basically MathML ln is translated to log by libsbml. So the internal representations of libsbml (a.getFormula()) and the MathML representation are not equivalent.

The MathML written out by libsbml is entirely consistent with the spec. I.e. <ln/> for natural logarithm and

<log/>
              <logbase>
                <cn type="integer"> 10 </cn>
              </logbase>

for base 10. While it is not strictly necessary to include the logbase for base 10 logarithms (it is assumed when it is absent), it is not wrong to be explicit and this seems to be what libsbml is doing when it is writing the MathML.

bgoli commented 3 years ago

The question is do we get libSBML to do it correctly or introduce a non-standard workaround.

jmrohwer commented 3 years ago

I am out of my depth here, have never delved in to the details of libsbml, I guess you would have more info there. It would be interesting to see where this inconsistency comes from. Under the hood libsbml must be using some libraries to parse the MathML, so are there errors in those libraries? Or is this all coded from first principles?

jmrohwer commented 3 years ago

I've submitted an issue in python-libsbml.

bgoli commented 3 years ago

The problem is that those tables are not only used for translation but to interpret our PSC files, so we will be changing our PSC definition of "log" from "log10" to "ln" which would affect all PSC files - not only those translated from SBML.

I agree we should deal with libSBML's behaviour, I'm just not sure changing or own specification (which is MathML compliant) is the way to go (perhaps we can do a custom hack in the translator code). I've attached a test file that I used for this stuff.

functest.psc.txt

R2:
    A = P
    pi*A - k2r*P*ln(S**log(2.0) - arcsinh(exp(A)/P))

Is internally represented as:

'self.rate=numpy.pi*self.mod.A-self.mod.k2r*self.mod.P*math.log(pow(self.mod.S,math.log10(2.0))-numpy.arcsinh(math.exp(self.mod.A)/self.mod.P))'

and this would change even though it never existed in SBML.

jmrohwer commented 3 years ago

Okay, that changes things quite significantly. I wasn't aware that these tables are also used elsewhere. In that case I'm not in favour of changing our PSC definitions because they are themselves MathML compliant. So we have to deal with libSBML in one way or another. Let's hear what the libSBML reponse is to this.

bgoli commented 3 years ago

It's obscure but it happens in the function token parsing func(arg1, arg2, ....) code in Pyscesparser.py (1232) I will look at an alternative workaround I'm sure we can do it in SBML2Core with a strategic find and replace :-)

jmrohwer commented 3 years ago

I have reverted 5675315 as per https://github.com/PySCeS/pysces/pull/45#issuecomment-897521045. Following https://github.com/sbmlteam/python-libsbml/issues/25#issuecomment-897533643 I tried replacing all instances of getFormula() with SBML.formulaToL3String(a.getMath()) and also replacing SBML.formulaToString(a.getMath()) with SBML.formulaToL3String(a.getMath()). This seems to work in general, and specifically for the logarithms.

However, there is a new issue, relating to exponents/powers. For the same repressilator model from #44:

r = m.getListOfReactions()[11]
l = r.getKineticLaw()
l.getFormula()
    'a0_tr + a_tr * pow(KM, n) / (pow(KM, n) + pow(PY, n))'
SBML.formulaToString(l.getMath())
    'a0_tr + a_tr * pow(KM, n) / (pow(KM, n) + pow(PY, n))'
SBML.formulaToL3String(l.getMath())
    'a0_tr + a_tr * KM^n / (KM^n + PY^n)'

While getFormula() and formulaToString() use Python compatible notation, formulaToL3String() uses the caret ^ which is not understood by the parser or by Python. One option would be to simply do SBML.formulaToL3String(l.getMath()).replace('^', '**') (tested, seems to work). If you think this is worth pursuing further I can commit these changes and we see from there.

bgoli commented 3 years ago

The L3parser was going to be my next suggestion and we need to switch to L3V1 anyway so your idea of that in combination with a symbol replace looks good to me. Today I implemented some L3 friendly changes in the sbml_log_translate branch, as well as, adding support for exporting piecewise functions in assignment rules and events (still broken) ... so ... just implemented your L3parser suggestions as well (b2949abacd17bfa3213c0dabab0d4a6c09badd51) That branch should now handle the SBML-->PSC ln/log issues correctly, although I haven't tested it yet so I'm not sure if there are any unintended features.

jmrohwer commented 3 years ago

I added some comments to b2949ab - double check please. IMO we don't need infix everywhere but do actually work on the MathML itself in some cases.

In your opinion, what is missing from releasing version 1.0.0? I would really like it if we could do that in the next week or 2. What is missing from switching to L3V1? I have some time on my hands so we could divide out the tasks, let me know how to help.

jmrohwer commented 3 years ago

Closing, superseded by #53