Feature request: #auxfun macros (and other #funs too if feasible?) to distinguish word order

GrammaticalFramework / gf-ud

Functions to analyse and manipulate dependency trees, as well as conversions between GF and dependency trees. The main use case is UD (Universal Dependencies), but the code is designed to be completely generic as for annotation scheme. This repository replaces the old gf-contrib/ud2gf code. It is also meant to be used in the 'vd' command of GF and replace the supporting code in gf-core in the future.

Other

7 stars 15 forks source link

Another issue: attachment of modifiers, suppose a phrase like

"each portion of a building separated by walls"

In dt, I get these two options:

#1
AdjCN
    ( AdvCN ( UseN portion_N )
        ( PrepNP of_Prep
            ( DetCN ( DetQuant IndefArt NumSg ) ( UseN building_N ) )
        )
    )
    ( PassVAgent separate_V
        ( DetCN (DetQuant IndefArt NumPl)  ( UseN wall_N ) )
    ): CN[2,3,4,5,6,8]

#LIN: "portion of a building separated by walls"

#2
AdvNP
    ( DetCN each_Det
        ( AdjCN ( UseN portion_N )
            ( PassVAgent separate_V
                ( DetCN (DetQuant IndefArt NumPl)  ( UseN wall_N ) )
          )
    )
   ( PrepNP of_Prep
        ( DetCN ( DetQuant IndefArt NumSg ) ( UseN building_N ) )
    ): NP[1,2,3,4,5,6,8]
#LIN: "portion separated by walls of a building"

However, dt doesn't contain the NP version of 1, which would be just to apply DetCN each_Det on that tree. I wonder if some pruning step removes the NP version of 1, because it covers as many words as 2? (I tried to run the example without pruneDevTree, but the particular sentence is very long and the program was taking a long time. If you think that might be the reason, I can produce a shorter version of the sentence and try again.)

In any case, I can only imagine that the NP-version of 1 would also be constructed, but it's thrown away before it can be prioritised. And I would like to prioritise it, because the attachment matches the word order: both "building" and "walls" are children of "portion", but in 1, building is more immediately attached.

I can solve the particular case with an #auxfun that says, every time when a NOUN has an acl and nmod child, put nmod before acl. But this is not ideal for scalability.

With an explicit DISTANCE=-1* or similar, I could duplicate that rule to say that whatever is closer to the head in the original word order, gets attached first in the tree. This is tedious, but finite: there are finite amount of relations, and finite combinations that appear together in real life texts.

Could one make a more fundamental change in the algorithm that wouldn't require explicit instructions about word order? Like ranking higher trees whose subtrees are attached according to distance in the original string. I don't know if this is feasible at all/requires too much rewriting. I can get by with auxfuns, just thinking aloud here.

1 Each each DET DT _ 2 det _ _ 2 portion portion NOUN NN Number=Sing 10 nsubj _ _ 3 of of ADP IN _ 5 case _ _ 4 a a DET DT Definite=Ind|PronType=Art 5 det _ _ 5 building building NOUN NN Number=Sing 2 nmod _ _ 6 separated separate VERB VBN Tense=Past|VerbForm=Part 2 acl _ _ 7 by by ADP IN _ 8 case _ _ 8 walls wall NOUN NNS Number=Plur 6 obl _ _ 9 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 10 cop _ _ 10 separate separate ADJ JJ Degree=Pos 0 root _ SpacesAfter=\n

GrammaticalFramework / gf-ud

Feature request: #auxfun macros (and other #funs too if feasible?) to distinguish word order #23