delph-in / pydmrs

A library for manipulating DMRS structures
MIT License
14 stars 6 forks source link

Add tfrom, tto #4

Open guyemerson opened 8 years ago

matichorvat commented 8 years ago

Also to consider is a token list, since it might not be contiguous.

goodmami commented 8 years ago

@matichorvat is right. tfrom and tto wouldn't be sufficient to capture non-contiguous tokens.

There are 4 Lnk types:

anncopestake commented 8 years ago

right, but I've never found a use for chart span (which is deprecated) or edge - these are internal to the parser/generator.

Arguably, there should be (at least) two classes of token list - one for the tokens the ERG uses and one for tokens corresponding to the POS tagger. The reason for character spans is that different systems make different assumptions about tokens. The alternative, hinted at in oe's comments in that file, is that the tokens used by the ERG carry information about their history, including the character span.

It seems that only cfrom and cto are in the DTDs in the repository - I don't know whether there are other versions elsewhere with token list included.

On 04/12/2015 19:37, Michael Wayne Goodman wrote:

Matic is right. There are 4 Lnk types http://svn.emmtee.net/trunk/lingo/lkb/src/mrs/lnk.lisp:

  • character span (<1:2> two integers; most often used)
  • chart span (<1#2> two integers)
  • token list (<1 3 4> a list)
  • edge (@1 mailto:@1 an atomic value)

|tfrom| and |tto| wouldn't be sufficient to capture non-contiguous tokens.

— Reply to this email directly or view it on GitHub https://github.com/delph-in/pydmrs/issues/4#issuecomment-162061995.