cltl / NAF-4-Development

Apache License 2.0
2 stars 1 forks source link

annotation schema used for pos, semRole, morphofeat, rfunc #5

Closed jiskattema closed 2 years ago

jiskattema commented 3 years ago

It is unclear how to indicate the schema used for various annotations, or what is the default schema.

For part-of-speech, the document lists the valid options, but these dont correspond to the ones used in the Newsreader pipeline.

For semRole, the NAF document only suggests using 'A0', 'A1', when the role corresponds to a PropBank predicate. but that is not specific enough. It is also unclear what to do if it is not a PropBank predicate.

For a term's morphofeat, there is no further mention of allowed values / content. The newsreader pipeline assumes it follows 'POS(A,B)' format where POS is a part-of-speech tag as produced by Alpino, and (A,B) are similarly from Alpino output.

For dependencies, there is the rfunc attribute. There is a (non exhaustive) list of values, but no way to indicate if these are from Universal Depencendies, or Alpino

sarnoult commented 2 years ago

The NAF Newsreader documentation must be understood in the context of the Newsreader project. For a project-free specification, it is perhaps best to directly look at the DTD.

Within Newsreader, POS tagging and parsing were performed with Alpino. If the DTD does not allow to represent information from Universal Dependencies, we can adapt it. Naf is currently being used for FrameNet annotations, so there is no restriction to PropBank.