A pipeline for turning Earth & Life science documents(images, videos, academic papers, news articles) into a full stack neural searchable knowledge base
GNU General Public License v3.0
A pipeline for turning Earth & Life science documents into a searchable knowledge base to aid researchers generate new hypotheses.


Installing dependencies

You should use the provided Dockerfiles for development, but in the case you rather install locally. You can

Windows: Read this guide on how to install poppler for windows: https://stackoverflow.com/questions/18381713/how-to-install-poppler-on-windows (required for mmda)

python -m venv wvenv # Create a virtual environment
. .\wvenv\Scripts\activate # Activate it
pip install -r requirements.txt # Install requirements


python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

All Systems:

>>> python # start a python repl in your command prompt
>>> import nltk
>>> nltk.download('wordnet')
>>> nltk.download('omw-1.4')
python -m spacy download en_core_web_sm # Install the spacy language model you want to use

Getting the data