dig-team / amie

Mavenized AMIE+Typing
Other
109 stars 19 forks source link

Improve "exploit max length" for Dangling Operator #43

Open falcaopetri opened 4 years ago

falcaopetri commented 4 years ago

Hi.

Maximum rule length is an interesting pruning technique introduced in AMIE+. In "Fast rule mining in ontological knowledge bases with AMIE+", we can read:

Maximum rule length: [...] for a not-yet-closed rule of length maxLen−1, AMIE+ will not apply the add-dangling-atom operator O_D, because this results in a non-closed rule, which will be neither output nor refined. In the same spirit, if the same rule contains more than two non-closed variables (see Sect. 3.3), AMIE+ will skip the application of the add-closing-atom operator O_C . This happens because an application of the operator O C can close at most two variables with one atom. This reasoning also applies to the instantiation operator O_I: rules with more than one non-closed variable are not refined with instantiated atoms, because the addition of an instantiated atom can close at most one variable.

Summing up, we have that:

I guess that dangling operator can be safely modified to consider if there are more than one non-closed variable, just as the instantiation operator.

Indeed, the dangling operator will:

Therefore, it can close at most one variable. The newly added variable may be closed by instantiation operator, but the previously opened variables cannot.

Best, Antonio.

falcaopetri commented 4 years ago

Hi.

I think the following logic may be related to "max length exploitation" for the Dangling Operator. I also feel that this may be a bug that incorrectly filters out candidate rules. The problem seems to occur only when mining constants, so it's hard to spot.

https://github.com/lajus/amie/blob/651455d0438e11ec68d83981c2c0dba309e4b984/mining/src/main/java/amie/mining/AMIE.java#L433-L448

The above code's idea seems to be that "the last atom added to a rule should not be generated by the dangling operator since the resulting rule will definitely be not-closed". Nonetheless, this ignores the possibility of applying the instantiation operator and make the rule closed.

Indeed, the code above seems to contradict the check done by the dangling operator itself:

https://github.com/lajus/amie/blob/651455d0438e11ec68d83981c2c0dba309e4b984/mining/src/main/java/amie/mining/assistant/DefaultMiningAssistant.java#L249-L257

falcaopetri commented 4 years ago

0k, now I get it.

The above code's idea seems to be that "the last atom added to a rule should not be generated by the dangling operator since the resulting rule will definitely be not-closed". Nonetheless, this ignores the possibility of applying the instantiation operator and make the rule closed.

The highlighted afirmation is wrong. Instantiation operator has already been applied to the dangling's output and is at temporalOutputMap.get("instantiated"). The code I suggested to remove is just avoiding adding a rule which has max size and is surely unclosed (because of the newly added dangling variable). If added, the rule will be shortly filtered by AMIE's output check, and the final output will be the same.

I will drop PR #44 since it has already incomporated this wrong understanding.

Nonetheless, the modification to the dangling's exploitMaxLengthOption is still valid, since it:

falcaopetri commented 4 years ago

I think I've finally found the real culprit for the performance boost shown in #46. This is MiningAssistant.getInstantiatedAtoms:

https://github.com/lajus/amie/blob/651455d0438e11ec68d83981c2c0dba309e4b984/mining/src/main/java/amie/mining/assistant/MiningAssistant.java#L793-L796

Note that if exploitMaxLengthOption is set, then... max length is not exploited (because the condition checked will be true).

Therefore, my proposed fix to this issue is: the current PR #46 + fixing the code above. The PR is still useful since it discards rules that would be unnecessarily enqueued.

lgalarra commented 3 years ago

Hi Antonio,

Thanks for this nice inquiry. I agree with your analysis and I vote for merging your PR into AMIE's main branch. I am waiting for @lajus approval.

Best, Luis