This library is not maintained anymore.
We moved nala to the text annotation tool, tagtog:
Text mining method for the extraction of sequence variants (genes or proteins) written in standard (ST) format (e.g. "E6V") or complex natural language (NL) (e.g. "glutamic acid was substituted by valine at residue 6").
Publication: Cejuela et al., nala: text mining natural language mutation mentions, Bioinformatics, 2018
Requires Python 3.6
git clone https://github.com/Rostlab/nala.git
cd nala
poetry shell
poetry install
python3 -m nalaf.download_data
NOTE: if you prefer installing with pip
(instead of poetry
), you will need pip >= 19.0, and then do:
pip install -r requirements.txt
pip install .
If you want to run the unit tests (excluding the slow ones) do:
nosetests -a '!slow'
The module python-crfsuite
(pycrfsuite
) may not install on Windows. See the original module.
Simple:
python3 nala_demo.py -p 15878741 12625412 # i.e. list of PMIDs to tag
python3 nala_demo.py -s "Standard (ST) examples: Asp8Asn or delPhe1388. Semi-standard (SST) examples: 3992-9g-->a mutation. Natural language (NL) examples: glycine was substituted by lysine at residue 18 (Gly18Lys)"
Programmatic access: nala/learning/train.py
API annotation service via tagtog.net: https://www.tagtog.net/-corpora/IDP4+