delph-in / pydelphin

Python libraries for DELPH-IN
https://pydelphin.readthedocs.io/
MIT License
79 stars 27 forks source link

phrase structure tree vs derivation tree #307

Closed arademaker closed 4 years ago

arademaker commented 4 years ago

From the http://moin.delph-in.net/ItsdbProfile I learned that the result file contains two syntactic structures: the derivation tree and the phrase structure tree. I know that Pydelphin can manipulate the derivation tree. What about the phrase structure tree? Maybe one is a subset of the other?

input: a club of people to play chess

derivation tree

(root_inffrag (2905 np_frg_c 5.890860 0 7 (2904 sp-hd_n_c 5.726763 0 7 (238 a_det -0.706323 0 1 ("a" 123 "token [ +FORM \\"a\\" +FROM \\"0\\" +TO \\"1\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"0\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"DT\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL + ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<0:1>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"a\\" +TICK + +ONSET c-or-v-onset ]")) (2903 hdn-aj_rc_c 5.660215 1 7 (2893 hdn-aj_redrel_c 1.078242 1 4 (2792 n_sg_ilr 0.000000 1 2 (143 club_n1 0.000000 1 2 ("club" 111 "token [ +FORM \\"club\\" +FROM \\"2\\" +TO \\"6\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"1\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"NN\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<2:6>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"club\\" +TICK + +ONSET c-or-v-onset ]"))) (2794 hd-cmp_u_c 0.814533 2 4 (150 of_poss -1.373764 2 3 ("of" 113 "token [ +FORM \\"of\\" +FROM \\"7\\" +TO \\"9\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"2\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"IN\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<7:9>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"of\\" +TICK + +ONSET c-or-v-onset ]")) (2793 hdn_bnp_c 1.353341 3 4 (2732 hdn_optcmp_c 1.720555 3 4 (2731 n_pl-irreg-noaff_olr 1.524053 3 4 (166 people_n1 0.549525 3 4 ("people" 115 "token [ +FORM \\"people\\" +FROM \\"10\\" +TO \\"16\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"3\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"NNS\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<10:16>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"people\\" +TICK + +ONSET c-or-v-onset ]"))))))) (2902 cl_rc-inf-nwh-sb_c 3.568442 4 7 (2901 hd-cmp_u_c 3.627378 4 7 (187 to_c_prop 0.399668 4 5 ("to" 117 "token [ +FORM \\"to\\" +FROM \\"17\\" +TO \\"19\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"4\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"TO\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<17:19>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"to\\" +TICK + +ONSET c-or-v-onset ]")) (2859 hd-cmp_u_c 2.413657 5 7 (2857 v_n3s-bse_ilr 1.046072 5 6 (210 play_v1 0.932092 5 6 ("play" 119 "token [ +FORM \\"play\\" +FROM \\"20\\" +TO \\"24\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"5\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"VB\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<20:24>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"play\\" +TICK + +ONSET c-or-v-onset ]"))) (2858 hdn_bnp_c 0.767529 6 7 (2837 n_ms_ilr 0.425233 6 7 (226 chess_n1 0.000000 6 7 ("chess" 121 "token [ +FORM \\"chess\\" +FROM \\"25\\" +TO \\"30\\" +ID *diff-list* [ LIST *cons* [ FIRST \\"6\\" REST *list* ] LAST *list* ] +TNT null_tnt [ +TAGS *null* +PRBS *null* +MAIN tnt_main [ +TAG \\"NN\\" +PRB \\"1.0\\" ] ] +CLASS alphabetic [ +CASE non_capitalized+lower +INITIAL - ] +TRAIT token_trait [ +UW - +IT italics +LB bracket_null [ LIST *list* LAST *list* ] +RB bracket_null [ LIST *list* LAST *list* ] +LD bracket_null [ LIST *list* LAST *list* ] +RD bracket_null [ LIST *list* LAST *list* ] +HD token_head [ +TI \\"<25:30>\\" +LL ctype [ -CTYPE- string ] +TG string ] ] +PRED predsort +CARG \\"chess\\" +TICK + +ONSET c-or-v-onset ]")))))))))))

phrase structure tree

(root_inffrag ("XP" ("NP" ("DET" ("a")) ("N" ("N" ("N" ("N" ("club"))) ("PP" ("P" ("of")) ("NP" ("N" ("N" ("N" ("people"))))))) ("S" ("VP" ("COMP" ("to")) ("VP" ("V" ("V" ("play"))) ("NP" ("N" ("N" ("chess")))))))))))
goodmami commented 4 years ago

That is just the labeled tree. We've discussed these in the past (see here and here). Basically, PyDelphin cannot create the labeled tree from a derivation as it requires unification using the grammar, which it doesn't do. If the tree is already in a profile or if you're parsing with ACE and the grammar supports labeled trees, you can use the tree() method of the delphin.interface.Result API to get the tree as a nested list, or the original string via dictionary-key access:

>>> from delphin import itsdb
>>> ts = itsdb.TestSuite('mrs')
>>> next(ts.processed_items()).result(0).tree()  # method returns a list
['S', ['NP-X', ['it']], ['VP', ['VP', ['V', ['rained.']]]]]
>>> next(ts.processed_items()).result(0)['tree']  # key access returns raw string
'("S" ("NP-X" ("it")) ("VP" ("VP" ("V" ("rained.")))))'
goodmami commented 4 years ago

I think the question is answered and there's nothing to do here. Closing.