allofphysicsgraph / latex-in-arxiv

extract math latex from content in arxiv
4 stars 1 forks source link

replace Latex string variables with equivalent numeric identifiers from Physics Derivation Graph #5

Open bhpayne opened 2 years ago

bhpayne commented 2 years ago

Each variable or constant in a math expression needs to be replaced by a unique numeric identifier. (The numeric ID can then be replaced by whatever notation the user desires when rendering the derivation as Latex.)

First, convert the input latex to Sympy representation (that issue https://github.com/allofphysicsgraph/latex-in-arxiv/issues/4)

>>> import sympy
>>> from sympy import *
>>> from sympy.parsing.latex import parse_latex
>>> expr1 = parse_latex('T = 1/f')

Then convert the SymPy string-based variables to numeric identifiers

>>> expr1
Eq(T, 1*1/f)
>>> srepr(expr1).replace('T','pdg9491').replace('f','pdg4201')
"Equality(Symbol('pdg9491'), Mul(Integer(1), Pow(Symbol('pdg4201'), Integer(-1))))"

If a different derivation uses q for frequency, then q would be replaced with pdg4201. This substitution process requires knowing what concept each variable represents.