MinishLab / model2vec

Distill a Small Static Model from any Sentence Transformer
https://minishlab.github.io/
MIT License
414 stars 18 forks source link

fix: update added tokens to be more agnostic #107

Closed stephantul closed 3 weeks ago

stephantul commented 3 weeks ago

This fixes #106 .

We were overfitting on berttokenizer, apparently roberta has a different structure. this fix works for roberta and bert.

codecov[bot] commented 3 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Files with missing lines Coverage Δ
model2vec/distill/tokenizer.py 73.21% <100.00%> (+0.48%) :arrow_up: