cltl / morphosyntactic_parser_nl

Morphosyntactic parser for Dutch based on the Alpino parser
Apache License 2.0
5 stars 4 forks source link

escaping long '--' sequences in comments #16

Closed sarnoult closed 5 years ago

sarnoult commented 5 years ago

Character escaping for comments (in alpino_dependency.py and convert_penn_to_kaf.py) currently replaces '--' by '-'. This leads to a ValueError with lxml.etree when documents contain longer dash sequences, e.g., '------'. Perhaps we could use '&ndash' as a replacement for '--'?

sarnoult commented 5 years ago

fixed with commit 82ed6f9