allofphysicsgraph / latex-in-arxiv

extract math latex from content in arxiv
4 stars 1 forks source link

find papers that contain derivations #17

Open bhpayne opened 7 months ago

bhpayne commented 7 months ago

I don't know know many effective search techniques, so I'll brainstorm a few.

I expect the following phrases are positively correlated: "This derivation" or "the derivation here" or "our derivation".

I expect the following features are negatively correlated: paper with no math expressions or with a single expression. (Fewer expressions present would make following the derivation harder if the paper were actually a derivation.)

More important than "here are a list of candidate papers that might contain derivations" is the reproducible search filters or sequence of filters that produced the candidate list.

msgoff commented 6 months ago

Looks like section and subsection are likely candidates if they contain the term derivation.
grep 'section{.*?}' -iPr|grep -i derivation 0303001.tex:\subsection{An Alternative Derivation}
0304113.tex:\subsection{Derivation by Orbits}
0304113.tex:\subsection{Alternative derivation by orbifolds}
0301174_cleaned.tex:\section{\bf{Seiberg-Witten transformation and integrated anomalies: an alternative derivation; Elliott formula}}
0304005.tex:\section{Derivation of regularized matrix formula \label{sec:are}}
0303008.tex:\section{Derivation of the planar constraints for the $A_2$ model}
0304007.tex:\section{Derivation of the Solution}
0302209.tex:\section{Derivation of the Superpotentials}
0302223.tex:%\section{Derivation of the covariant curvature formalism}
0301078.tex:\subsection{Chern-Simons derivation}
0303095.tex:\subsubsection{Another derivation of Einstein's equation}
0301090.tex:\section{Derivation of the 3D massive super Yang-Mills action\label{derivation}}
0301090.tex:\subsection{Matrix model derivation}
0301090.tex:\subsection{Derivation from the supermembrane action}
0301090.tex:\subsection{Derivation from the D2 Dirac-Born-Infeld action}
0303028.tex:\section{The propagator derivation}\label{prop}
0303019.tex:\section{Derivation of the basic equations }
0302178.tex:\section{The dynamical derivation of the scaling law}
0303251.tex:\section{Derivation of superconvergence relation}
0303150.tex:\subsection{Fusion - the derivation}
0302150.tex:%\subsection{Derivation of CSW 2.38 for SO(2N) case}
0302150.tex:%\subsubsection{Derivation of CSW 2.38 for Sp(2N) case}
0301186.tex:\appendix{\section{Alternative derivation of eq.(4.14)}}
0303256.tex:\subsection{A Derivation of ADHM/Nahm construction from Nahm Transformation}
0303200.tex:\section{Derivation of the fermion mode}
0302069.tex:\subsection{General derivation}
0302216.tex:\section{Derivation of the Effective Hamiltonian}
0303126.tex:\subsection{Alternative derivation of doublet boundary terms}
0304195.tex:\subsection{Derivation of 3D effective theory}

msgoff commented 6 months ago

The option for lhs context and rhs context length has been added to the scanner.
including the tokens from context.rl it will match the token and the token + context as two separate matches.
This can be used later in the ranking functions for assigning a likelihood score as to whether or not the paper contains a derivation.

msgoff commented 6 months ago

Here are some basic patterns that reliably return papers with derivations. I saved them in a file called derivation_grep_file I ran it with the following command. to show the matches.
grep -f derivation_grep_file -Eir --color

to show only the file names. grep -f derivation_grep_file -Elir

.{1,20}full derivation.{1,20} derivation.{1,50}cite derivation.{1,50)ref In the derivation below we use our derivation section.{1,50}derivation the derivation presented above the derivation presented below the full derivation below We present the derivation and

The results seem decent. I would rank ones that have the term basic or overview in close proximity to derivation with a lower score. This is much slower in comparison to adding the patterns in ragel but is easier to get started with. Next steps might involve adding more patterns, constraints, or building a classifier.
I am open to any of the above.

https://en.wikipedia.org/wiki/Constraint_satisfaction_problem
looks interesting, specifically in relation to type inference of the variables and expressions. https://en.wikipedia.org/wiki/Type_inference