allofphysicsgraph / latex-in-arxiv

extract math latex from content in arxiv
4 stars 1 forks source link

associate symbols used in expressions with text description #20

Open bhpayne opened 10 months ago

bhpayne commented 10 months ago

As an example, suppose the following is in a paper:

\begin{equation}
a = b + c
\end{equation}
where $c$ is the number of cows and $b$ is the number of bats.  

For this paper,

The relevance of picking these variable definitions out is to then find other papers with that same variable, even if the symbol being used is different. (In another paper where w is the number of cows.)

Success here is

  1. identify the meaning of a given symbol
  2. identify uses of the same meaning in different papers
msgoff commented 10 months ago

in the src directory I added
pip install -r requirements.txt
python decompress_model.py
python nltk_downloads.py
python get_symbol_defs.py
I am confident that on average you should get at least 10% of the variable definitions with the get_symbol_defs.py file.

The concordance dict can be used for additional processing as it extracts every sentence where a variable is used.