Open cmungall opened 3 years ago
I can confirm it's not avoiding any collapses, as if I reduce the ptable to omit 1
ie
A:1 B:1 0.0 0.0 0.95 0.05
B:1 C:1 0.0 0.0 0.95 0.05
A:1 C:1 0.99 0.0 0.01 0.0
then it correctly finds
B:1 EquivalentTo C:1 (most probable) 0.95
A:1 EquivalentTo B:1 (most probable) 0.95
I think the issue here is the high number of "windows" requested (100). Input rows are sorted according to their best probability, then the list of rows is chunked into the given number of windows. Across each independent run, shuffling occurs within each window, but the windows stay in the same total order. So it will always first add A ProperSubClassOf C
. If you use a window value of 1, the rows are completely randomized and it is able to find the best solution.
See the logging at the beginning of a run (with 100 windows requested):
2021.02.05 14:32:54:070 [zio-def...] [INFO ] org.monarchinitiative.boomer.Boom.evaluate:30 - Bin size: 1; Most probable: 0.99
2021.02.05 14:32:54:091 [zio-def...] [INFO ] org.monarchinitiative.boomer.Boom.evaluate:30 - Bin size: 2; Most probable: 0.95
2021.02.05 14:32:54:095 [zio-def...] [INFO ] org.monarchinitiative.boomer.Boom.evaluate:33 - Max possible joint probability: -0.11263692462860261
The axioms in the first bin will always be added before proceeding to the next bin. Different runs will just shuffle the order of the two items in the second bin.
my ticket is in error... more later
I think we cleared this up. "windows" may not be as obvious as they ought to be but I think the UI will continue to evolve.
still an issue
A:1 B:1 0.0 0.0 0.95 0.05
B:1 C:1 0.0 0.0 0.95 0.05
A:1 C:1 0.99 0.0 0.01 0.0
running
boomer -t triangle.ptable.tsv -a triangle.owl -p prefixes.yaml -r 500 -w 1 -e 200 --output-internal-axioms true
yields
## SINGLETONS
Method: singletons
Score: -0.05129329438755058
Estimated probability: 1.0
Confidence: 1.0
Subsequent scores (max 10):
- [B:1](http://purl.obolibrary.org/obo/B_1) EquivalentTo [C:1](http://purl.obolibrary.org/obo/C_1) (most probable) 0.95
and an incoherent output.ofn
for text files see #157.
Given:
(in each case, the only other possibility is siblingOf)
note each class is in a separate prefix space, so there is no penalty for equivalence between any
Solutions:
boomer generally selects {1} depending on params, but never the optimal
I am pretty sure I have not made a typo - I put each class in its own ID space, so it is not avoiding 2 or 3 (which would happen if A/B/C were in the same ID space)