Closed michelole closed 5 years ago
BILSTMC3GClassifier and VocabularyDumper uses Lucene tokenizers (via DataUtilities), while LSTMClassifier uses dl4j tokenizers.
BILSTMC3GClassifier
VocabularyDumper
DataUtilities
LSTMClassifier
StringCleaning.stripPunct(token).toLowerCase();
DRY.
Probably choose the one with highest coverage rate in BioSentVec (this has to be checked against the .vec file).
.vec
BILSTMC3GClassifier
andVocabularyDumper
uses Lucene tokenizers (viaDataUtilities
), whileLSTMClassifier
uses dl4j tokenizers.StringCleaning.stripPunct(token).toLowerCase();
DRY.
Probably choose the one with highest coverage rate in BioSentVec (this has to be checked against the
.vec
file).