alpheios-project / tokenizer

Alpheios Tokenizer Service
1 stars 0 forks source link

add token exception for cts urns and uris #2

Closed balmas closed 3 years ago

balmas commented 4 years ago

I want to be able to allow cts urns in metadata in input files, and we'll need to code a tokenization rule for them so that spacy doesn't treat the : and . in them as puncutation.

We should probably add similar exceptions for http uris.