Open inariksit opened 2 years ago
Another issue: attachment of modifiers, suppose a phrase like
"each portion of a building separated by walls"
In dt
, I get these two options:
#1
AdjCN
( AdvCN ( UseN portion_N )
( PrepNP of_Prep
( DetCN ( DetQuant IndefArt NumSg ) ( UseN building_N ) )
)
)
( PassVAgent separate_V
( DetCN (DetQuant IndefArt NumPl) ( UseN wall_N ) )
): CN[2,3,4,5,6,8]
#LIN: "portion of a building separated by walls"
#2
AdvNP
( DetCN each_Det
( AdjCN ( UseN portion_N )
( PassVAgent separate_V
( DetCN (DetQuant IndefArt NumPl) ( UseN wall_N ) )
)
)
( PrepNP of_Prep
( DetCN ( DetQuant IndefArt NumSg ) ( UseN building_N ) )
): NP[1,2,3,4,5,6,8]
#LIN: "portion separated by walls of a building"
However, dt
doesn't contain the NP version of 1, which would be just to apply DetCN each_Det
on that tree. I wonder if some pruning step removes the NP version of 1, because it covers as many words as 2? (I tried to run the example without pruneDevTree
, but the particular sentence is very long and the program was taking a long time. If you think that might be the reason, I can produce a shorter version of the sentence and try again.)
In any case, I can only imagine that the NP-version of 1 would also be constructed, but it's thrown away before it can be prioritised. And I would like to prioritise it, because the attachment matches the word order: both "building" and "walls" are children of "portion", but in 1, building is more immediately attached.
I can solve the particular case with an #auxfun that says, every time when a NOUN has an acl
and nmod
child, put nmod
before acl
. But this is not ideal for scalability.
With an explicit DISTANCE=-1*
or similar, I could duplicate that rule to say that whatever is closer to the head in the original word order, gets attached first in the tree. This is tedious, but finite: there are finite amount of relations, and finite combinations that appear together in real life texts.
Could one make a more fundamental change in the algorithm that wouldn't require explicit instructions about word order? Like ranking higher trees whose subtrees are attached according to distance in the original string. I don't know if this is feasible at all/requires too much rewriting. I can get by with auxfuns, just thinking aloud here.
Here's a conllu file to test with
1 Each each DET DT _ 2 det _ _
2 portion portion NOUN NN Number=Sing 10 nsubj _ _
3 of of ADP IN _ 5 case _ _
4 a a DET DT Definite=Ind|PronType=Art 5 det _ _
5 building building NOUN NN Number=Sing 2 nmod _ _
6 separated separate VERB VBN Tense=Past|VerbForm=Part 2 acl _ _
7 by by ADP IN _ 8 case _ _
8 walls wall NOUN NNS Number=Plur 6 obl _ _
9 is be AUX VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 10 cop _ _
10 separate separate ADJ JJ Degree=Pos 0 root _ SpacesAfter=\n
Current behaviour, it treats phrases like "Section 10" (apposition) and "10 sections" identically.