MedTagger contains a suite of programs that the Mayo Clinic NLP program has developed in 2013. It includes three major components: MedTagger for indexing based on dictionaries, MedTaggerIE for information extraction based on patterns, and MedTaggerML for machine learning-based named entity recognition.
The updated release includes a dictionary based on MedLex, a corpus-driven semantic lexicon, that maps to OMOP Concept identifiers. MedTagger for indexing is built upon a fast string matching algorithm leveraging lexical normalization. The contextual annotator enables the detection of local context for concept entries detected. The new release of the dictionary maps to the OMOP Concept identifiers. For the detailed information of those concept identifiers, please visit http://athena.ohdsi.org.
MedTagger IE Pipelines use a custom ruleset format. An example ruleset of Coronavirus Diseases 19 (COVID 19) related symptoms (e.g. dry cough, fever, fatigue)
can be found here under the /src/main/resources/medtaggerieresources/covid19
directory. These resources are what tells MedTagger
what to do/extract, and this directory is expected as input for the RULEDIR parameter
Download the latest release from https://github.com/OHNLP/MedTagger/releases
Extract the zip file
Modify the INPUTDIR
, OUTPUTDIR
, and RULEDIR
variables in run_medtagger_win.bat
or run_medtagger_unix_mac.sh
, as appropriate
INPUT_DIR
: full directory path of input folder OUTPUT_DIR
: full directory path of output folderRULES_DIR
: full directory path of 'Rule' folderExample for Mac:
INPUTDIR="$YOUR_INPUT_DIRECTORY"
OUTPUTDIR="$YOUR_OUTPUT_DIRECTORY"
RULEDIR="$YOUR_MEDTAGGER_HOME/medtaggerieresources/covid19"
Example for Windows:
INPUTDIR="C:\$YOUR_INPUT_DIRECTORY\input"
OUTPUTDIR="C:\$YOUR_OUTPUT_DIRECTORY\output"
RULEDIR="C:\YOUR_MEDTAGGER_HOME\medtaggerieresources\covid19"
Run the batch file
Mac/linux:
run_medtagger_unix_mac.sh
Windows:
run_medtagger_win.bat
${env.SECRET_ACTOR}
with your github username and ${env.SECRET_TOKEN}
with the generated token.mvn clean install -s settings.xml
ant dist
MedTagger.zip
in the root directoryLiu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, Ravikumar KE, Wu ST, Kullo IJ, Chute CG. An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings. 2013;2013:149.
Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, Liu S, Sohn S, Liu H, Fan J. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. npj Digital Medicine. 2019 Dec 17;2(1):1-7.