it seems there are some treebank data files (maybe only older ones prior to the Perseids tokenization workflow) that don't mark enclytics/proclytics with a hyphen. E.g. in Ovid Metamorphoses sentences 2 and 4, primaque and congestaque have been split into "que" "prima" and "que" "congesta" without using a hyphen on "que". We could test for specific known enclytic words to deal with this, or just look for merged words without hyphens.
it seems there are some treebank data files (maybe only older ones prior to the Perseids tokenization workflow) that don't mark enclytics/proclytics with a hyphen. E.g. in Ovid Metamorphoses sentences 2 and 4, primaque and congestaque have been split into "que" "prima" and "que" "congesta" without using a hyphen on "que". We could test for specific known enclytic words to deal with this, or just look for merged words without hyphens.