allofphysicsgraph / latex-in-arxiv

extract math latex from content in arxiv
4 stars 1 forks source link

what categories is each paper associated with? #2

Open bhpayne opened 3 years ago

bhpayne commented 3 years ago

For a given paper, identify which topics are in that paper. The purpose of this would be to help narrow the scope of what a variable is referring to. For example, c is relativity is usually different from c used in algebra. Categorizing the topic(s) of a paper could provide context for how to interpret the variable.

bhpayne commented 3 years ago

Some mixture of TF-IDF plus citation tracing?

bhpayne commented 1 year ago

A more concrete framing is the following:

  1. suppose you have 1000 .tex files
  2. of the 1000 files, 50 files reference the variable c
  3. of the 50 papers referencing c,
    • do subsets of the 50 papers have citations overlapping? If yes, those references to c might be in the same domain.
    • do subsets of the 50 papers use similar jargon? If yes, those references to c might be in the same domain.