Closed jofatmofn closed 7 years ago
Thanks for the report. BLLIP Parser calls them heads, but I think this is a bit of a misnomer and they're really closer to dependencies (in a governor-dependent sense). I'm afraid if you're looking for direct children, you'll need to extend the "head finder" or use the extracted dependencies and walk up the tree to find direct children.
On Thu, Jun 22, 2017 at 8:43 PM, jofatmofn notifications@github.com wrote:
I am having the following code `
constituency_string = str(rrp.parse_tagged(tokens, possible_tags=dict(enumerate(postags)))[0].ptb_parse)
tree = Tree(constituency_string)
`
For the sentence "An interesting date is four days from today.", the expected head (a direct child) and the actual head (pre-terminal) from tree object are depicted below:
`
(S1 # Expected head: S; Got VBZ
(S # Expected head: VP; Got VBZ (NP # Head: NN (DT An) (JJ interesting) (NN date)) (VP # Head: VBZ (VBZ is) (NP # Expected head: NP; Got NNS (NP # Head: NNS (CD four) (NNS days)) (PP # Head: IN (IN from) (NP # Head: NN (NN today))))) (. .)))
` I am creating NAF output for the subsequent coreference resolution module. I have written additional code to match the expected results. Is this a bug in bllipparser?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BLLIP/bllip-parser/issues/56, or mute the thread https://github.com/notifications/unsubscribe-auth/AAm5ZfsBbvoVPIHXZHOJXG8COM2ScHy5ks5sGzRdgaJpZM4ODGdb .
I am using pynaf to generate NAF output and I need to call naf_document.add_constituency_tree. Have decided to use extracted dependencies and walk up the tree. Sharing the code, with the hope that it is useful to someone.
def constituent_tree_to_naf(parent_node, parent_tid, is_parent_root):
# Depth first tree navigation
# This method will NOT be called with parent_node a preterminal. Hence it is assured that all the child_nodes are nonterminals.
global tid, terminals, ntid, non_terminals, edgeid, edges, direct_head_less, edge_idx
head_in_child = False
for child_node in parent_node.__iter__():
if parent_node.head().__str__() == child_node.__str__():
head_in_child = True
break
if not head_in_child:
direct_head_less.append((parent_node, None, True)) # Headless node, edge_id to put header attribute, if head is yet to be found
for child_node in parent_node.__iter__():
# non_terminals (constituent_id, constituent_Label)
ntid += 1
non_terminals.append(("nter" + str(ntid), child_node.label))
for i, dhl_t in enumerate(direct_head_less):
if dhl_t[2] and dhl_t[0].head().__str__() == child_node.__str__():
edges[dhl_t[1]] = edges[dhl_t[1]] + ("yes",)
direct_head_less[i] = (dhl_t[0], dhl_t[1], False)
# edges. (edge_id, from_id,to_id, head)
edgeid += 1
edge_idx += 1
if is_parent_root or parent_node.head().__str__() == child_node.__str__():
edges.append(("tre" + str(edgeid), "nter" + str(ntid), "nter" + str(parent_tid), "yes"))
else:
edges.append(("tre" + str(edgeid), "nter" + str(ntid), "nter" + str(parent_tid)))
if child_node.is_preterminal():
# terminals. (constituent_id, [term_id])
tid = tid + 1
terminals.append(("ter" + str(tid), ["t" + str(tid)])) # TODO: Check if there can be a situation where term_id <> terminal id
# edges. (edge_id, from_id,to_id, head)
edgeid += 1
edge_idx += 1
edges.append(("tre" + str(edgeid), "ter" + str(tid), "nter" + str(ntid)))
else: # non terminal, but not pre terminal
for i, dhl_t in enumerate(direct_head_less):
if dhl_t[2] and parent_node.__str__() == dhl_t[0].__str__():
direct_head_less[i] = (dhl_t[0], edge_idx, dhl_t[2])
constituent_tree_to_naf(child_node, ntid, False)
The calling method has this code (where tokens is the list of tokens and postags is the corresponding POS tags):
global tid, terminals, ntid, non_terminals, edgeid, edges, direct_head_less, edge_idx
tid = -1
ntid = -1
edgeid = -1
For each sentence
terminals = []
non_terminals = []
edges = []
direct_head_less = []
edge_idx = -1
ntid += 1
non_terminals.append(("nter" + str(ntid), "ROOT"))
constituency_lisp_string = str(rrp.parse_tagged(tokens, possible_tags=dict(enumerate(postags)))[0].ptb_parse)
tree = Tree(constituency_lisp_string)
head = tree.head()
constituent_tree_to_naf(tree, ntid, True)
naf_document.add_constituency_tree(non_terminals, terminals, edges)
I am having the following code `
`
For the sentence "An interesting date is four days from today.", the expected head (a direct child) and the actual head (pre-terminal) from tree object are depicted below:
`
` I am creating NAF output for the subsequent coreference resolution module. I have written additional code to match the expected results. Is this a bug in bllipparser?