bst-mug / n2c2

Support code for participation at the 2018 n2c2 Shared-Task Track 1
https://n2c2.dbmi.hms.harvard.edu
Apache License 2.0
6 stars 4 forks source link

DRY NN Tokenizers #86

Closed michelole closed 5 years ago

michelole commented 5 years ago

BILSTMC3GClassifier and VocabularyDumper uses Lucene tokenizers (via DataUtilities), while LSTMClassifier uses dl4j tokenizers.

DRY.

Probably choose the one with highest coverage rate in BioSentVec (this has to be checked against the .vec file).