"id" should not be tokenized in Latin

reported by @monzug

the particular case of "id" is probably a bug in the latin language model of the tokenizer.

The tokenizer does currently try to separate enclytics (e.g Nisi -> Ni si , Neque -> Ne que) but it in an early state and will not catch everything. We may also need to make that a preference the user can specify. But separating "id" is clearly wrong and is probably English defaults bleeding through (i'd -> i [woul]d) so i'll register that as a bug against the tokenizer.

Originally posted by @balmas in https://github.com/alpheios-project/alignment-editor-new/issues/58#issuecomment-719613633

alpheios-project / tokenizer

"id" should not be tokenized in Latin #18