bst-mug / n2c2

Support code for participation at the 2018 n2c2 Shared-Task Track 1
https://n2c2.dbmi.hms.harvard.edu
Apache License 2.0
6 stars 4 forks source link

Successful strategies from i2b2 shared tasks #5

Closed michelole closed 6 years ago

michelole commented 6 years ago

Take a look on what other people did in the past in similar tasks.

See https://dbmi.hms.harvard.edu/programs/healthcare-data-science-program/clinical-nlp-challenges/7-2014-deid-heartdisease

kugami commented 6 years ago

According to this paper: "Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2"

general information this task focused on identifying medical risk factors related to Coronary Artery Disease in the longitudinal medical records of diabetic patients a note about the corpus: the training data consisted of 60% of the total corpus (790 records) and the testing data consisted of the remaining 40% (514 records)

submissions -- what did other people do? note: this will be listed based on the past ranking, so 1. equals first place in the past competition

1st place: approach of mention level classification task,

this data was used for training, and then preprocessing: identification of section headers, negation words, modality words and output from ConText, rules were used for locating trigger words, medication and measurement, SVM classifiers were used to identify the validity and polarity of each mention, smoking status was identified by using a single 5-way classifier and a separate rule-based classifier for family history

2nd place: Divided risk factors into three categories

After preprocessing the texts with MedEx (=medication information extraction system for clinical narratives)

3rd place: Approach as a multiple text categorization task

Important Takeaway’s Between the top-performing approaches, there were some similarities Pre-processing tools to gain syntactic information and only one (3rd place) added temporal attributes Nearly all the systems used medical lexicons, either as UMLS, Drugs.com and Wikipedia Only one of the teams did not mention using a lexicon of medical terms Hypertension and Family History had the best performance over all risk factors (result partly due to the collection of files – mostly indicated no family history at all) Among the top few – mostly all of them had similar performance in the system tests

michelole commented 6 years ago

👍 so... no neural nets?