the particular case of "id" is probably a bug in the latin language model of the tokenizer.
The tokenizer does currently try to separate enclytics (e.g Nisi -> Ni si , Neque -> Ne que) but it in an early state and will not catch everything. We may also need to make that a preference the user can specify. But separating "id" is clearly wrong and is probably English defaults bleeding through (i'd -> i [woul]d) so i'll register that as a bug against the tokenizer.
reported by @monzug
the particular case of "id" is probably a bug in the latin language model of the tokenizer.
The tokenizer does currently try to separate enclytics (e.g Nisi -> Ni si , Neque -> Ne que) but it in an early state and will not catch everything. We may also need to make that a preference the user can specify. But separating "id" is clearly wrong and is probably English defaults bleeding through (i'd -> i [woul]d) so i'll register that as a bug against the tokenizer.
Originally posted by @balmas in https://github.com/alpheios-project/alignment-editor-new/issues/58#issuecomment-719613633