This adds logic to walk through the SDD structure returned by Problog to come up with a measure of proof depths. From trying out a few hand-chosen examples the depths matched the current ruletaker depth semantics.
For proof, the SDD structure is returned (mostly) as is with light formatting.
Also included are a couple of other changes:
Grammar changes in the type 2 theories to prohibit negated standalone facts, and ungrounded negated facts in rule antecedents. This could have been added as post-processing but this was a whole lot simpler. I do think the original grammar probabilities are preserved here. Please comment if you see any discrepancies.
Adding a min_num_positive_examples hyperparameter in the config (currently set to half the number of total examples) to ensure that at least half the generated examples have positive labels. This is because I noticed that with the introduced grammar changes there were a lot more negatives than positives.
Also note that it is hard currently for this to generate larger depth theories, for instance, in my generated theory2 with the latest code, I have:
Total Examples: 1000
No. of Negative Examples: 500
No. of Positive Examples: 500
No. with proof depth 0: 114
No. with proof depth 1 : 362
No. with proof depth 2: 20
No. with proof depth 3: 4
Without forward chaining, the best we can do is to massively over-generate to get more higher-depth examples. This is not great.
NOTE: We had discussed possibly removing probabilities from the theories and to rely on Problog simply processing them using the Prolog engine, sine the probabilistic semantics are subtly different. However without probabilities Problog does not give an output SDD, which we use to determine proof depths. So the probabilities are still kept (all with a value of 1.0 of course).
This adds logic to walk through the SDD structure returned by Problog to come up with a measure of proof depths. From trying out a few hand-chosen examples the depths matched the current ruletaker depth semantics. For proof, the SDD structure is returned (mostly) as is with light formatting.
Also included are a couple of other changes:
min_num_positive_examples
hyperparameter in the config (currently set to half the number of total examples) to ensure that at least half the generated examples have positive labels. This is because I noticed that with the introduced grammar changes there were a lot more negatives than positives.Also note that it is hard currently for this to generate larger depth theories, for instance, in my generated theory2 with the latest code, I have:
Without forward chaining, the best we can do is to massively over-generate to get more higher-depth examples. This is not great.
NOTE: We had discussed possibly removing probabilities from the theories and to rely on Problog simply processing them using the Prolog engine, sine the probabilistic semantics are subtly different. However without probabilities Problog does not give an output SDD, which we use to determine proof depths. So the probabilities are still kept (all with a value of 1.0 of course).