dig-team / amie

Mavenized AMIE+Typing
Other
110 stars 19 forks source link

Doubts related to InjectiveMappingsAssistant #33

Open falcaopetri opened 4 years ago

falcaopetri commented 4 years ago

Hi,

I've recently read in #30 that AMIE does not mine injective rules by default. This explains some weird rules I was getting in my experiments (I actually thought it was a bug).

For example, I've mined a rule which begins with the atoms: ?a functional_relation ?b ?a functional_relation ?c ... (i.e., ?b always equal to ?c)

Also, there are many other mined rules which seem very suspicious, i.e., it appears that the rule only passes the thresholds because of the non-injective bindings.


I was previously assuming that AMIE was injective. Therefore, I have some questions:

  1. Is the non-injective mapping an intended behavior? By quickly reading GPro/GRank's paper, it seems that injective mapping "gives a good bias for real-world knowledge." I.e., are you aware of cases where non-injectiveness is desirable?

  2. I did not fully understand GPro/GRank metrics yet, and it will probably be out of the scope of my project. As far as I understood, InjectiveMappingsAssistant works like Default/LazyMiningAssistant, but considers only injective bindings when calculating support and confidences. Is that right?

  3. Is the InjectiveMappingsAssistant stable? If not, how could I help? I would definitely want to see/use it in master.

  4. In AMIE 3's paper, you say that "The work of [5] noted that the PCA confidence can under-estimate the likelihood of a prediction in the presence of non-injective mappings." I actually thought that AMIE would over-estimate the confidence since it would allow non-injective bindings, i.e., more bindings than expected.


Implementation detail

I had a quick look into the gpro branch (KB modifications, Rule modifications, InjectiveMappingsAssistant, and GPro).

InjectiveMappingsAssistant seems identical to DefaultMiningAssistant, only differing when getting the Rule's atoms (injectiveOf and related methods).

I think that another approach would be to extend Rule as an InjectiveRule. We could then specify to the mining assistant which kind of rule should be mined.

I'm not sure if some other mining assistant could benefit from this setup. Implementing custom Mining Assistants is already an interesting approach for extensibility but seems to duplicate too much code in some cases.

Anyway, InjectiveMappingsAssistant is already implemented, so I would be happy to go with it.

Regards, Antonio.

lajus commented 4 years ago

Hi,

Non-injective mappings may cause over-estimation of the support but it will be more likely counter-balanced by a much greater over-estimation of the body size (number of instantiations of the body/number of predictions of the rule) which will result in an under-estimation of the confidence.

Consider the KB with 3 facts: marriedTo(Elvis, Priscilla); hasChild(Elvis, Lisa); hasChild(Priscilla; Lisa) and the rule: hasChild(x, z) & hasChild(y, z) => marriedTo(x, y) Both injective and non-injective mappings give a support of 1 ((x, y) = (Elvis, Priscilla)) The confidence with injective mappings is 1/2: (predictions: (x, y) = (Elvis, Priscilla) | (Priscilla, Elvis)) The confidence with non-injective mappings is 1/4: (predictions: (x, y) = (Elvis, Priscilla) | (Priscilla, Elvis) | (Elvis, Elvis) | (Priscilla, Priscilla))

It may be true a rule may appear cause it will artificially pass the support threshold but I cannot think of a practical example.

I can't think of any cases where results only due to non-injective mappings are desirable. However, mining using such non-injective mappings interferes with reflexive relations such as EqualsTo (the equality test) and can results in some nasty edge cases: For example, given the KB: marriedTo(Elvis, Priscilla); marriedTo(Priscilla, Elvis) What is the confidence of the rule (used to mine keys or almost-keys): marriedTo(x, z) & marriedTo(y, z) => x = y Using injective mappings, the confidence is 2 over 0.

It may also mess up with the monotonous property of the support: The rule: marriedTo(x, z) => marriedTo(y, z) consider only distinct entities x and y while its refinement x = y & marriedTo(x, z) => marriedTo(y, z) only consider equals x and y.

It may also mess up with the instantiations as injective mappings only consider distinct values of the variables and not distinct values of the variables once they are instantiated. In that regard, AMIE adds additional DifferentFrom (!=) atoms where it sees fit but it is not a definitive solution.

To sum it up, it is not trivial and, using injective mappings, there are some edge cases to rigorously consider.

InjectiveMappingsAssistant internally adds DifferentFrom atoms between the variables (that are not bound together) before executing the count queries. Thanks for suggesting to extend the Rule class into an InjectiveRule class, it seems indeed the best way to do and I will look into it.

The actual problem of the InjectiveMappingsAssistant is the extensive hacking I had to perform in the KB in order to make the "Existential variable detection" optimization of AMIE 3 works in this setting. The result is really dirty (quality of code wise) and not extensively tested, which is the reason I am reluctant to merge the GPro branch into master.

Cheers, Jonathan

falcaopetri commented 4 years ago

Hi,

Thanks for the detailed examples, @lajus!

The confidence under-estimation does indeed make sense.

It may be true a rule may appear cause it will artificially pass the support threshold but I cannot think of a practical example.

Maybe the "artificial" rule happens when mining more than 3 atoms. I suppose the following is an example (which is similar to rules I've mined in my data). Consider the KB with 4 facts: hasChild(Elvis, Lisa); hasChild(Priscilla; Lisa); hasBirthday(Elvis, DATE1), hasBirthday(Priscilla, DATE2) and the rule: hasChild(x, z) & hasChild(y, z) & hasBirthday(y, b) => hasBirthday(x, b)

Non-injective mapping support will be 2 ((x, b) = (Elvis, DATE1) | (Priscilla, DATE2)) Injective mapping support will be 0 (there are no bindings satisfying x != y)

Does it make sense?


The edge cases you described are interesting considerations and shows part of the different things to be aware of.

Nonetheless, I couldn't find any place where EQUALS would be added to a rule (although I've found the check for EQUALS in the head atom in Rule.injectiveOf).


Since I'm interested in removing some mined rules, I will probably take some time to go over the code related to injectiveness and report if I find something.

Regards, Antonio.

falcaopetri commented 4 years ago

Hi,

I just finished reviewing part of the gpro branch (Rule, InjectiveMappingsAssistant, and KB, excluding KB.selectDistinctMappings).

Just to check my understanding, the "Existential variable detection" optimization for rules with != would also be (to some extent) beneficial to other Mining Assistants. Of course, InjectionMappingsAssistant makes heavy use of != atoms, so you were required to implement this particular optimization case for it. Is that correct?

Going through the (static) code and comparing it with master, I think I understood your decisions when implementing the optimization for != relation. And I would say that the implementation seems to be correct.

About the code quality issue, I guess that the whole KB code would benefit from some refactoring.


Some questions that emerged during code review:

  1. Rule.getMeasure docstring says that it returns null, but actually returns -1. Indeed, it is used in contexts such as rule.getMeasure("...") > threshold. https://github.com/lajus/amie/blob/0f6a510cd666be002d7f0419828b0638fd15c710/rules/src/main/java/amie/rules/Rule.java#L198-L206

  2. Rule.injectiveOf. As I understood, the connectedVariables manipulation is meant to allow r(X, X) in the body for any r, or in the head if r is EQUALS. Is that right? https://github.com/lajus/amie/blob/0f6a510cd666be002d7f0419828b0638fd15c710/rules/src/main/java/amie/rules/Rule.java#L1782-L1793

  3. KB.existsOptimCanRewrite. Does the following code line mean that we can have a x != CONST atom? https://github.com/lajus/amie/blob/0f6a510cd666be002d7f0419828b0638fd15c710/kb/src/main/java/amie/data/KB.java#L2133

I suppose it is used when we are trying to instantiate a rule. Nonetheless, I think it could be used to fix the problem you pointed on:

It may also mess up with the instantiations as injective mappings only consider distinct values of the variables and not distinct values of the variables once they are instantiated.

i.e., if so desired, we could add var != CONST during getInstantiatedAtoms to force future bindings to be distinct to the newly introduced CONST value.

  1. KB.existsBS1. master vs gpro: https://github.com/lajus/amie/blob/a2beeca2969202602d48f421ba682acc99a02ac2/kb/src/main/java/amie/data/KB.java#L2105

vs

https://github.com/lajus/amie/blob/0f6a510cd666be002d7f0419828b0638fd15c710/kb/src/main/java/amie/data/KB.java#L2169

I also found this related line in master: https://github.com/lajus/amie/blob/a2beeca2969202602d48f421ba682acc99a02ac2/kb/src/main/java/amie/data/KB.java#L2127

  1. KB.selectDistinct. I guess it is correct, but it seemed a little bit suspicious. What is the selectDistinct for a variable which does not appear in the query? https://github.com/lajus/amie/blob/0f6a510cd666be002d7f0419828b0638fd15c710/kb/src/main/java/amie/data/KB.java#L2518

  2. InjectiveMappingsAssistant do not support iexr/itr (#16) since it implements its own getInstantiatedAtoms (this is more like a note to myself, since I will need to add this feature).

  3. Rule.addToVariableMap. Not so relevant, but I've recently learned that we can do:

    private static boolean addToVariableMap(int[] atom, Int2ObjectMap<IntSet> s) {
         if (!KB.isVariable(atom[0]) || !KB.isVariable(atom[2])) {
             return false;
         }
-        IntSet r = s.get(atom[0]);
-        if (r == null) { s.put(atom[0], r = new IntOpenHashSet()); }
+        IntSet r = s.computeIfAbsent(atom[0], k -> new IntOpenHashSet());
         r.add(atom[2]);
-        r = s.get(atom[2]);
-        if (r == null) { s.put(atom[2], r = new IntOpenHashSet()); }
+        r = s.computeIfAbsent(atom[2], k -> new IntOpenHashSet());
         r.add(atom[0]);
         return true;
     }

Regards, Antonio.

lgalarra commented 4 years ago

Hi,

Thanks for the detailed examples, @lajus!

The confidence under-estimation does indeed make sense.

It may be true a rule may appear cause it will artificially pass the support threshold but I cannot think of a practical example.

Maybe the "artificial" rule happens when mining more than 3 atoms. I suppose the following is an example (which is similar to rules I've mined in my data). Consider the KB with 4 facts: hasChild(Elvis, Lisa); hasChild(Priscilla; Lisa); hasBirthday(Elvis, DATE1), hasBirthday(Priscilla, DATE2) and the rule: hasChild(x, z) & hasChild(y, z) & hasBirthday(y, b) => hasBirthday(x, b)

Non-injective mapping support will be 2 ((x, b) = (Elvis, DATE1) | (Priscilla, DATE2)) Injective mapping support will be 0 (there are no bindings satisfying x != y)

Does it make sense?

The edge cases you described are interesting considerations and shows part of the different things to be aware of.

Nonetheless, I couldn't find any place where EQUALS would be added to a rule (although I've found the check for EQUALS in the head atom in Rule.injectiveOf).

Since I'm interested in removing some mined rules, I will probably take some time to go over the code related to injectiveness and report if I find something.

Regards, Antonio.

Hi Antonio,

We used EQUALS at some point to mine keys https://github.com/lgalarra/vickey/blob/master/src/amie/keys/assistant/KeyMinerMiningAssistant.java

A key is a combination of attributes that identifies entities uniquely and can be formulated as Horn rules as follows:

r(x, z1) r(y, z1) .... r'(x, zk) r'(y, zk) => equals(x, y)

e.g., lastName(?x, n1), lastName(?y, n1), birthDate(?x, b1), birthDate(?y, b1) => ?x == ?y

If this rule has confidence 100%, we say {lastName, birthDate} defines a key.

Luis