Closed michelole closed 4 years ago
For word2vec, the minimal set of rules that does not affect lenient metrics@10 is:
acro not in full
is_possible_expansion
is_acronym_tail_on_last_word
trim_plural
(allows match NTX -> Nierentransplantation, but generates noise for BMS)More rules might be needed if we're evaluating in a smaller rank (empirically, _is_relative_length_valid
is also important to remove noise such as AINS: Ains)
not _has_capitals
misses RV: rechtsventrikulär, but reduces overall noise.