caporaso-lab / sourcetracker2

SourceTracker2
BSD 3-Clause "New" or "Revised" License
62 stars 45 forks source link

possible bug #24

Closed gregcaporaso closed 7 years ago

gregcaporaso commented 8 years ago

From @wdwvt1 on January 30, 2016 0:15

Unsure if this is a bug - but if the number of unknown sequences reaches 0 and alpha2 is set to 0, then lines 370-372 of sourcetracker.py might raise a zero division error.

We have not verified this in tests (can't get the unknown that low) but it deserves investigation.

_Copied from original issue: biota/sourcetracker2internal#9

wdwvt1 commented 8 years ago

I have been able to get this to happen, but the case where it would occur in the algorithm would have to be pretty pathological. Specifically, you'd have to get an Unknown to have no assigned sequences, which is nearly impossible. Here is an example:

from sourcetracker._sourcetracker import ConditionalProbability
import numpy as np
cp = ConditionalProbability(.1, 0, 10, np.arange(10).reshape(2,5))
cp.set_n(100)
cp.precalculate()
cp.calculate_cp_slice(3, 0, 0, np.array([np.arange(10).sum() - 1, 0, 0]))
# array([ 0.12358804,  0.01768752,         nan])

I don't think we realistically have to worry about this.

wdwvt1 commented 7 years ago

We've solved this as now alpha2 is required to be non-zero. As long as that is true, there will always be some probability mass in the Unknown.