Tweak regression scores due to DJL upgrade

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 67.17%. Comparing base (98e4866) to head (f5d624f).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #2535 +/- ## ============================================ + Coverage 67.14% 67.17% +0.02% - Complexity 1479 1481 +2 ============================================ Files 219 219 Lines 12641 12643 +2 Branches 1528 1528 ============================================ + Hits 8488 8493 +5 + Misses 3625 3624 -1 + Partials 528 526 -2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

theyorubayesian commented 2 months ago

This is interesting. Does this happen for tokenizers trained & stored locally and tokenizers attached to models on HF?

Tagging @ToluClassics for thoughts.

lintool commented 2 months ago

This is interesting. Does this happen for tokenizers trained & stored locally and tokenizers attached to models on HF?

Tagging @ToluClassics for thoughts.

Not sure. In this PR are all the scores that changed. Everything else didn't.

theyorubayesian commented 2 months ago

I took a look and tested it with the GPT-2 tokenizer. The DJL implementation doesn't use the modelMaxLength, even if it is set in the tokenizer_config.json.

lintool commented 2 months ago

Supersede by #2536 which is the better solution.

castorini / anserini

Tweak regression scores due to DJL upgrade #2535

Codecov Report