dasguptar / treelstm.pytorch

Tree LSTM implementation in PyTorch
MIT License
550 stars 139 forks source link

Trying to understand cparents.txt in Constituency parsing #23

Open inigo-jauregi opened 6 years ago

inigo-jauregi commented 6 years ago

I have downloaded the SICK data and obtained the dependency and constituency parsing with the fetch_and_preprocess.sh file.

I am now trying to understand what is the information that is generated in the cparents.txt file. This is an example:

a.txt -> Two dogs are fighting a.cparents.txt -> 5 5 7 7 6 0 6

If I am not mistaken, from the cparents.txt I should be able to build the parse tree. Is that right? And how would the tree for this example look like?

Thanks for any help in advance

inigo-jauregi commented 6 years ago

Another example:

a.txt -> Two dogs are playing by a tree a.cparents.txt -> 8 8 10 11 12 13 13 9 0 9 10 11 12

yeladlouni commented 5 years ago

For N tokens, you'll obtain a binary tree of 2N-1 nodes. To use contituency parsing, you should implement BinaryTreeLSTM which is not ported from the lua original code.

yalunar commented 5 years ago

First assign numbers from 0 to length-1 to the initial sentence in a.txt. These number stands for indices of leaf nodes. Then substract 1 for all the numbers in a.cparents.txt. Now the number here stands for the index of parent node. Take the following for example: a.txt -> Two dogs are fighting | Two ->0,dogs -> 1,are -> 2,fighting -> 3. a.cparents.txt -> 5 5 7 7 6 0 6 | a.cparents.txt -> 4 4 6 6 5 -1 5

Write cparents as following, the numbers here stands for the indices of a node, -1 stands for the root node: 4 4 6 6 5 -1 5 | parent node index 0 1 2 3 4 5 6 | child node index the first row is the parent of the second row. For example,4 is the parent of 0 and 1, 6 is the parent of 2 and 3. leaf nodes are 0,1,2,3. Now you can have the tree. Hope this help. @ijauregiCMCRC

venusafroid commented 5 years ago

First assign numbers from 0 to length-1 to the initial sentence in a.txt. These number stands for indices of leaf nodes. Then substract 1 for all the numbers in a.cparents.txt. Now the number here stands for the index of parent node. Take the following for example: a.txt -> Two dogs are fighting | Two ->0,dogs -> 1,are -> 2,fighting -> 3. a.cparents.txt -> 5 5 7 7 6 0 6 | a.cparents.txt -> 4 4 6 6 5 -1 5

Write cparents as following, the numbers here stands for the indices of a node, -1 stands for the root node: 4 4 6 6 5 -1 5 | parent node index 0 1 2 3 4 5 6 | child node index the first row is the parent of the second row. For example,4 is the parent of 0 and 1, 6 is the parent of 2 and 3. leaf nodes are 0,1,2,3. Now you can have the tree. Hope this help. @ijauregiCMCRC

Thank you very much !!!!!! I have been confused for a long time. You answer is really helpful !!!!!

venusafroid commented 5 years ago

First assign numbers from 0 to length-1 to the initial sentence in a.txt. These number stands for indices of leaf nodes. Then substract 1 for all the numbers in a.cparents.txt. Now the number here stands for the index of parent node. Take the following for example: a.txt -> Two dogs are fighting | Two ->0,dogs -> 1,are -> 2,fighting -> 3. a.cparents.txt -> 5 5 7 7 6 0 6 | a.cparents.txt -> 4 4 6 6 5 -1 5

Write cparents as following, the numbers here stands for the indices of a node, -1 stands for the root node: 4 4 6 6 5 -1 5 | parent node index 0 1 2 3 4 5 6 | child node index the first row is the parent of the second row. For example,4 is the parent of 0 and 1, 6 is the parent of 2 and 3. leaf nodes are 0,1,2,3. Now you can have the tree. Hope this help. @ijauregiCMCRC By the way, what the meaning of lines in dparents.txt?

yalunar commented 5 years ago

@venusafroid Hi! Lines in dparents.txt represent dependency of two words. For example, the sentence is: "Two dogs are wrestling and hugging." The numbers in dparents.txt are parent node idxes of every word: 2 4 4 0 4 4 The i-th number represent the parent node index of the i-th word. 0 represents root node. Then the parent node of "Two" is "dogs", the parent node of "dogs, are, and, hugging" are "wrestling".