Closed Nzteb closed 3 years ago
Hi @Nzteb,
Excellent catch! Yes, certain rules are more problematic than the rest, especially those with "aunt/uncle" inv-un
and "nephew/niece" un
relations. The problematic rule here as you correctly pointed it out is (child, inv-un) -> sibling
. The reasoning behind this rule was to encompass the sibling relationship between the parent and aunt/uncle. So this would not be problematic in cases such as:
However, this can be problematic in this case (which is the case for this story)
SO
relationship).The generation engine in this version doesn't account for these edge cases well. Now as you suggested, deleting this rule would make sense as you totally circumvent the issue. However, I think this can also be fixed in a post-processing step, where a logic could be : A child B. B inv-un C. gender(A) == gender(C)
. I'll take this account in the develop branch where I am building a faster generation pipeline.
Thanks again for reporting this issue! :)
Hi @koustuvsinha,
thanks for the reply!
I am not sure if I completely understand why A and C should be SO in the first place. In my understanding in 1. C still does not need to be sibling of of A (C could be sibling of, say, D who is SO to A).
It seems that the complexity of the dataset is pretty much unchanged after removing the rule so I am happy with this as a temorary fix until the updated version.
Thanks again!
Best
They need not be, I'm just outlining a possibility of a failure case! The way I currently perform the graph generation is as follows:
SO(A,B), child(B,C)...
etc.Male(A), Female(B)
with some restrictions such as if SO(A,B)
exists, gender should be toggled.child(A,B) --> B is the son of A
if gender(B) == Male. The fix in the code should be to add more of such restrictions in the third step after 2. Concretely, we need to remove the fact SO(A,C)
if child(A,B), inv-un(B,C), Male(A), Female(C)
is in the set of facts . The easiest step to do so is to parse the facts to see if child(A,B), inv-un(B,C)
occurs, and then set the gender of A to be equal to the gender of C. (i.e in this toy scenario, replace Female(C) with Male(C).
Yes, removing this rule would not change the complexity of the dataset. Glad it works for you, I'll update this issue once I have made the fix and appropriate tests in develop branch. Thanks for bringing this to my attention! :)
Hi, thanks for your great work!
After generating data with
python main.py --train_tasks 1.2,1.3 --test_tasks 1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10 --train_rows 5000 --test_rows 500 --holdout
in 1.10_test.csv, I found the following story:
[Laura] has a daughter called [Penny]. The husband of [Penny] is [Robert]. [Craig] is a brother of [Robert]. [Robert] is the father of [William]. [Ruthann] is a sister of [Robert]. [Eugenia] is [Craig]'s daughter. [Alicia] is the aunt of [Eugenia]. [William] is a brother of [Gary]. [Robert] has a son called [Gary]. [Robert] is [Ruthann]'s brother.
Where the target is:
['[Laura] has a daughter called [Alicia]. ']
After convincing myself that this cannot hold, I looked into the proof state also provided in the csv:
[{('Laura', 'daughter', 'Alicia'): [('Laura', 'son', 'Craig'), ('Craig', 'sister', 'Alicia')]}, {('Laura', 'son', 'Craig'): [('Laura', 'son', 'Robert'), ('Robert', 'brother', 'Craig')]}, {('Laura', 'son', 'Robert'): [('Laura', 'daughter', 'Ruthann'), ('Ruthann', 'brother', 'Robert')]}, {('Laura', 'daughter', 'Ruthann'): [('Laura', 'daughter', 'Penny'), ('Penny', 'sister', 'Ruthann')]}, {('Penny', 'sister', 'Ruthann'): [('Penny', 'son', 'Gary'), ('Gary', 'aunt', 'Ruthann')]}, {('Penny', 'son', 'Gary'): [('Penny', 'husband', 'Robert'), ('Robert', 'son', 'Gary')]}, {('Gary', 'aunt', 'Ruthann'): [('Gary', 'father', 'Robert'), ('Robert', 'sister', 'Ruthann')]}, {('Craig', 'sister', 'Alicia'): [('Craig', 'daughter', 'Eugenia'), ('Eugenia', 'aunt', 'Alicia')]}, {('Gary', 'father', 'Robert'): [('Gary', 'brother', 'William'), ('William', 'father', 'Robert')]}]
The spicy part here is in line 5 "Penny sister Ruthann", if this would be true then Penny would be sister and wife of Robert. However, it is proven with [('Penny', 'son', 'Gary'), ('Gary', 'aunt', 'Ruthann')].
It seems this arises from one of the rules in rules-store.yaml:
child:inv-un:sibling
If I understand correctly this says if A has child B and B has aunt/uncle C then A and C are siblings. Note, however, this does not hold in general as C could be sibling of the wife/husband of A but not of A. In our example, exactly this is the case as Ruthann is the sister of Robert but not of Penny.
Sorry if I have a misunderstanding until here, have I overlooked something?
Would it be enough to simply delete this rule from rules-store.yaml and re-generate the data?
Thanks a lot