Closed gregcaporaso closed 7 years ago
I have been able to get this to happen, but the case where it would occur in the algorithm would have to be pretty pathological. Specifically, you'd have to get an Unknown to have no assigned sequences, which is nearly impossible. Here is an example:
from sourcetracker._sourcetracker import ConditionalProbability
import numpy as np
cp = ConditionalProbability(.1, 0, 10, np.arange(10).reshape(2,5))
cp.set_n(100)
cp.precalculate()
cp.calculate_cp_slice(3, 0, 0, np.array([np.arange(10).sum() - 1, 0, 0]))
# array([ 0.12358804, 0.01768752, nan])
I don't think we realistically have to worry about this.
We've solved this as now alpha2 is required to be non-zero. As long as that is true, there will always be some probability mass in the Unknown.
From @wdwvt1 on January 30, 2016 0:15
Unsure if this is a bug - but if the number of unknown sequences reaches 0 and alpha2 is set to 0, then lines 370-372 of sourcetracker.py might raise a zero division error.
We have not verified this in tests (can't get the unknown that low) but it deserves investigation.
_Copied from original issue: biota/sourcetracker2internal#9