CUNY-CL / udtube

Neural morphological analyzer
Apache License 2.0
3 stars 3 forks source link

Can we drop `CustomEncoding`? #32

Closed kylebgorman closed 2 months ago

kylebgorman commented 3 months ago

The CustomEncoding class says it is for encoders that lack a Rust-backed tokenizer ("like ByT5"). However it is used in the implementation of all data sets.

Can we:

  1. Drop support for ByT5 etc. (it apparently is overly heavy anyways)
  2. Just use tokenizers.Encoding in its place?

@DanielYakubov please advise here.