Living-with-machines / DeezyMatch

A Flexible Deep Learning Approach to Fuzzy String Matching
https://living-with-machines.github.io/DeezyMatch/
Other
139 stars 34 forks source link

Add option to extend the vocabulary when fine-tuning a model #115

Open mcollardanuy opened 2 years ago

mcollardanuy commented 2 years ago

Context: how do we deal with missing vocabulary when fine-tuning a model? This is particularly an issue with ngram/word tokenization (the characters_v001.vocab solves it in part models using char tokenization).