coli-saar / am-parser

Modular implementation of an AM dependency parser in AllenNLP.
Apache License 2.0
30 stars 10 forks source link

Multiple TokenRanges per node #69

Closed alexanderkoller closed 5 years ago

alexanderkoller commented 5 years ago

Am I reading the code correctly that we can currently only have one TokenRange for each node in an amconll file?

This is limiting for UCCA, where we see e.g. the following annotation for 700009:

{"id": "700009", "flavor": 1, "framework": "ucca", "version": 0.9, "time": "2019-05-18 (06:35)", "input": "It included \"All Along the Watchtower\", with lyrics derived from the Book of Isaiah (21:5-9). The song was later recorded by Jimi Hendrix, whose version Dylan later acknowledged as definitive.", "tops": [37], "nodes": [{"id": 0, "anchors": [{"from": 0, "to": 2}]}, {"id": 1, "anchors": [{"from": 3, "to": 11}]}, {"id": 2, "anchors": [{"from": 12, "to": 13}]}, {"id": 3, "anchors": [{"from": 13, "to": 16}, {"from": 17, "to": 22}, {"from": 23, "to": 26}, {"from": 27, "to": 37}]},
...

If I want to change this to support multiple token ranges per node, is there something that I should pay specific attention to?

@namednil @mariomgmn

namednil commented 5 years ago

Yes, that's correct. Maybe throw an exception in ranges() if one word has multiple token ranges?

alexanderkoller commented 5 years ago

Not needed after all - mtool handles this gracefully.