bst-mug / acres

Acronym expansion module based on word embeddings and filtering rules
Apache License 2.0
1 stars 2 forks source link

Evaluate impact of each filtering rule and drop the ones that are not needed #60

Closed michelole closed 4 years ago

michelole commented 6 years ago
michelole commented 5 years ago

For word2vec, the minimal set of rules that does not affect lenient metrics@10 is:

More rules might be needed if we're evaluating in a smaller rank (empirically, _is_relative_length_valid is also important to remove noise such as AINS: Ains)

not _has_capitals misses RV: rechtsventrikulär, but reduces overall noise.