gipplab / AnnoMathTeX

>>AnnoMathTeX<< - a LaTeX formula annotation facilitation and recommendation tool for STEM documents
Apache License 2.0
6 stars 2 forks source link

Introduction

Mathematical formulae are a significant part of scientific documents (books, articles, web pages, etc.) in the fields of science, technology, engineering, and mathematics (STEM). In most of the current information retrieval approaches, mathematical formulae are not considered, even though they are very common in texts within STEM fields. Since mathematical formulae contain a lot of important information, they should not be ignored when analyzing and comparing documents. Currently, there is no large labeled dataset available, containing mathematical formulae annotated with their semantics, that could be used to train machine learning models. >>AnnoMathTeX<< offers a first approach to facilitate the annotation of mathematical formulae in STEM documents. It recommends names for formulae and their constituting identifiers (characters/symbols, e.g. constants and variables) to the user who is annotating the document and thus enables the creation of a labeled dataset.

Definitions

Identifiers

Identifiers in mathematical formulae are the meanings attached to symbols contained within a formula. For example, the identifier E means "energy" in the formula E=mc^2.

Formula Concept

The concept of a formula is the name or meaning (semantics) that can be associated with it. For example, a possible concept name annotation for the formula E=mc2 would be "mass-energy equivalence".

>>AnnoMathTeX<<

AnnoMathTeX is a standalone web-based LaTeX text and formula annotation recommendation tool for STEM documents, implemented with the python framework django. It allows users to annotate identifiers contained in mathematical formulae, as well as entire formulae contained in a document with possible concept names selected from a list of suggested recommendations.

The recommendations for the formulae and identifer concept names are taken from five different sources:

Components/Modules/Workflow

Getting Started

The system is hosted by Wikimedia at http://annomathtex.wmflabs.org/.

If you want to run the system locally, these instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Python version >=3.6 is recommended.

Installing

Clone or download the repository. In your shell navigate to the folder AnnoMathTeX and create & activate a new virtual environment. Then run the command

pip install -r requirements.txt

Usage

Start The Server

In a terminal navigate to the folder where the manage.py file is located (AnnoMathTeX/annomathtex) and run the command

python manage.py runserver

Select a File

Open a browser window and navigate to localhost:8000.

Select the file that you would like to annotate with the file browser.

After selecting and uploading the file you will see the processed and rendered document in your browser window. You can now start annotating. Mathematical environments are enclosed with highlighted dollar signs, and the identifiers are highlighted. All other characters that are not to be annotated in the mathematical environment are coloured in grey.

Annotating an Identifier

To annotate an identifier, simply click on the highlighted character (e.g. "E") in the document and you will see a pop-up with a table of recommendations. To select one of the suggested recommendations, click on the matching cell, and it will be highlighted (along with all other matching cells from different sources). The annotated identifier will be highlighted in green, and a table holding all the annotations that have been made is constructed at the top of the document. If you unselect/cancel annotations. If none of the recommendations match, you can manually enter a name.

Types of Annotations

Two different types of annotations are possible: A global annotation, and a local annotation.

Global Annotation

By default the anotation mode is set to global annotation. This means that if you anntotate, e.g. the identifier E with "energy", all occurences of this identifier in the document will automatically receive this annotation.

Local annotation

To annotate an identifier locally (meaning that only this occurence of the identifier will be annotated), select the "local" option at the top of the table.

Saving the annotations

To save the anntotations, simply click the "save" button at the top left of the page. This will write the annotations to a json file and create a csv file containing an evaluation table with comparison of the performance of the different sources.

If you open the same file again at a later point in time, the annotations you made previously will be reloaded and you can continue right where you left off.

Evaluation

Results

For each file, an evaluation table of the following format is constructed.

Identifier Name arXiv Wikipedia Wikidata WordWindow Type
X variable - 6 - 1 global
p manual insertion - - - - global
f function 2 - - - local

The identifier X was annotated globally with "variable", which was found in the recommendations from the Wikipedia list and from the word window (positions 6 and 1 in the respective columns). For the identifier p, no matches were found; it was annotated with a manual insertion. The identifier "f" was annotated locally with "function", which was found in the recommendations from the arXiv list at position 2.

License

This project is licensed under the Apache License 2.0.

Authors

See also the list of contributors who participated in this project.

Acknowledgments

We thank the Wikimedia for hosting our web-based system.