cltl / NAF-4-Development

Apache License 2.0
2 stars 1 forks source link

element IDs contain information #3

Closed jiskattema closed 2 years ago

jiskattema commented 3 years ago

The NAF document requires identifiers to use a prefix depending on the type of element: 'w' for words, 't' for terms, etc.. Additionally, some of the Newsreader pipeline tools expect the rest of the id to be a number.

This is contrary to the 'NAF should be simple' design of NAF.

sarnoult commented 2 years ago

You are entirely right. The id prefixes are only conventional and the requirement to follow this convention is limited to the Newsreader project. They have nothing to do with the NAF format itself, so you are free to use any ids you like, or almost: the only requirement from the NAF specification is that the ids for a number of elements, including word forms and terms, should be unique in a document. This is because higher-level annotations are anchored to spans of word forms or terms. This means for instance that you cannot give ids like "1", "2", etc. to both the text and terms layer.