facebookresearch / clutrr

Diagnostic benchmark suite to explicitly test logical relational reasoning on natural language
Other
90 stars 14 forks source link

Robust reasoning dataset for cycle noise (supporting facts) doesn't have edge types for the cycle's edges #20

Open erg0dic opened 11 months ago

erg0dic commented 11 months ago

Hello!

Thanks for making the CLUTRR dataset available. I have been using it to benchmark compositional reasoning in ML models. I think it is a useful benchmark and have come across multiple instances of it being used in recent papers that present models that tackle reasoning type problems in NLP.

Now, coming to the issue:

I was using the dataset from your EMNLP paper provided here to test out some graph models. It seems that there is no edge information for task 2.k (where the noise is the addition of nodes that correspond to adding cycles to the original chain in the story graph). For the other types of noise information (3.k, 4.k) it is easy to just random sample edge types since the noise additions are independent/terminal and don't feed back into the same logic graph. But that's not possible for 2.k type tasks.

For example, for the following story:

'[Mary] and her mother [Nettie] went to the mall to try on new clothes. [Mary] has a daughter named [Jennifer] [Cecilia] took her sister, [Mary], out to dinner for her birthday. [Cecilia] bought her mother, [Nettie], a puppy for her birthday. [Ryan] bought a new dress for his daughter [Jennifer].'

whose corresponding edge representation is:

[(0, 1), (1, 2), (2, 3), (2, 4), (4, 3)]

The edge types for only the first three nodes are provided:

['daughter', 'mother', 'mother']

whereas presumably edge (2,4) should have the edge type 'sister' and (4,3) should have an edge type 'mother' for the noise node Cecilia. Looking through the robust reasoning dataset, there is no info on the edge types of noisy nodes.

Can you please provide the corresponding datasets

If not, can you please help me understand how GAT results were obtained in Table 2 of your paper since the graph formulation of the task requires the adjacency matrix with edge type entries right?

Only the first dataset seems important as far the paper is concerned so the rest are not super important. I believe (please correct me if I'm wrong) that the first one is used to report results for GAT in table 2 in the paper since that is the only one where k=2,3 as reported in section 4.2 of the paper.

Thanks!