alexeygrigorev / namespacediscovery-pipeline

Mathematical namespace discovery
1 stars 3 forks source link

Pipeline for Mathematical namespace discovery

input:

output:

Running It

git clone https://github.com/alexeygrigorev/namespacediscovery-pipeline.git
cd namespacediscovery-pipeline/src
python pipeline.py

Modify luigi.cfg to set different configuration parameters

You need to at least change the following parameters:

Other parameters ([DEFAULT] section):

Dependencies

for PyData stack libraries such as numpy, scipy, scikit-learn and nltk it's best to use anaconda installer

Not all dependencies come pre-installed with anaconda, use pip to install them:

pip install python-Levenshtein
pip install fuzzywuzzy
pip install luigi
pip install rdflib

We also need to download some data for nltk: the list of stopwords and the model for tokenization. Run it in the python console to install them:

import nltk
nltk.download('stopwords')
nltk.download('punkt')

see SETUP.md for an example how to set up the environment

Datasets

We use the following datasets as input:

Classification schemes:

The classification schemes datasets are already available in the data directory.