KorAP / KorAP-XML-Krill

Merge KorapXML data and create Krill documents
BSD 2-Clause "Simplified" License
1 stars 1 forks source link

Support non-word tokens #5

Closed Akron closed 5 years ago

Akron commented 5 years ago

Currently, only character sequences containg /[\d\w]/ are considered tokens and therefore can be annotated. To easily support punctuations as tokens and annotated pauses, it's necessary to support a switch to tokenize non-word tokens. This was requested by the DRuKoLa project.