allofphysicsgraph / latex-in-arxiv

extract math latex from content in arxiv
4 stars 1 forks source link

for a list of math symbols in all the papers, what are the various definitions of each symbol? #3

Open bhpayne opened 3 years ago

bhpayne commented 3 years ago

Suppose we can pick out math symbols from all the papers.

bhpayne commented 3 years ago

Symbol definitions, if not in the paper itself, might be in cited papers (use bibliographic citation tracing)

msgoff commented 1 year ago

xz -d HEP_TEX.model.xz
pip install requirements.txt
python resolve_symbol_definitions.py tex_file

The script tries to map all variable names to their definition(s)/and or properties in the file.
currently maps 10-30% of definitions otherwise creates a Concordance dictionary where every sentence that uses the variable is in a dictionary of lists.

I have found there are roughly 50k variables used in HEP.
Many of which do not use the same definition.

bhpayne commented 7 months ago

Is HEP_TEX.model.xz in the git repo?

msgoff commented 7 months ago

The file HEP_TEX.model.xz was removed.
I will update the python files to use the results from scanner.out for word tokenization.

The first pass at resolving symbol defintions can be found in the utils directory
run make variable_definitions in the utils directory

./variable_definitions.out ../sound1.tex |grep '<:.*?:>' -oP

the results look like <:, the fine structure constant $\alpha$:>
<:and the proton-to-electron mass ratio $\frac{m_p}{m_e}$:>
<:the upper bound for the speed of sound in condensed phases, $v_u$:>
<:We find that $\frac{v_u}{c}=\alpha\left(\frac{m_e}{2m_p}\right)^{\frac{1}{2}}$:>
...

The results from the python version can be found in https://github.com/allofphysicsgraph/latex-in-arxiv/blob/master/symbol_definitions