AmpersandTV / pymc3-hmm

Hidden Markov models in PyMC3
Other
94 stars 13 forks source link

test_PoissonZeroProcess_random fails intermittently #72

Closed fanshi118 closed 3 years ago

fanshi118 commented 3 years ago

Description of your problem or feature request

One of the tests has failed at one run in CI. Doubt if this is a consistent error, which can possibly be fixed by setting a seed for that test?

Please provide a minimal, self-contained, and reproducible example.

pytest tests --cov=pymc3_hmm --cov-report=xml:./coverage.xml

Please provide the full traceback of any errors.

    def test_PoissonZeroProcess_random():
        test_states = np.r_[0, 0, 1, 1, 0, 1]
        test_dist = PoissonZeroProcess.dist(10.0, test_states)
        assert np.array_equal(test_dist.shape, test_states.shape)
        test_sample = test_dist.random()
        assert test_sample.shape == (test_states.shape[0],)
        assert np.all(test_sample[test_states > 0] > 0)

        test_sample = test_dist.random(size=5)
        assert np.array_equal(test_sample.shape, (5,) + test_states.shape)
        assert np.all(test_sample[..., test_states > 0] > 0)

        test_states = np.r_[0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0]
        test_dist = PoissonZeroProcess.dist(100.0, test_states)
        assert np.array_equal(test_dist.shape, test_states.shape)
>       test_sample = test_dist.random(size=1)

tests/test_distributions.py:379: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pymc3_hmm.distributions.PoissonZeroProcess object at 0x7f9395608c50>
point = None, size = 1

    def random(self, point=None, size=None):
        """Sample from this distribution conditional on a given set of values.

        Parameters
        ----------
        point: dict, optional
            Dict of variable values on which random values are to be
            conditioned (uses default point if not specified).
        size: int, optional
            Desired size of random sample (returns one sample if not
            specified).

        Returns
        -------
        array
        """
        with _DrawValuesContext():

            (states,) = draw_values([self.states], point=point, size=size)

            if size:
                # `draw_values` will not honor the `size` parameter if its arguments
                # don't contain random variables, so, when our `self.states` are
                # constants, we have to broadcast `states` so that it matches `size +
                # self.shape`.
                # Unfortunately, this means that our sampler relies on
                # `self.shape`, which is bad for other, arguable more important
                # reasons (e.g. when this `Distribution`'s symbolic inputs change
                # shape, we now have to manually update `Distribution.shape` so
                # that the sampler is consistent)

                states = np.broadcast_to(
                    states, tuple(np.atleast_1d(size)) + tuple(self.shape)
                )

            samples = np.empty(states.shape)

            for i, dist in enumerate(self.comp_dists):
                # We want to sample from only the parts of our component
                # distributions that are active given the states.
                # This is only really relevant when the component distributions
                # change over the state space (e.g. Poisson means that change
                # over time).
                # We could always sample such components over the entire space
                # (e.g. time), but, for spaces with large dimension, that would
                # be extremely costly and wasteful.
                i_idx = np.where(states == i)
                i_size = len(i_idx[0])
                if i_size > 0:
                    subset_args = distribution_subset_args(
                        dist, states.shape, i_idx, point=point
                    )
>                   samples[i_idx] = dist.dist(*subset_args).random(point=point)
E                   ValueError: shape mismatch: value array of shape (15,)  could not be broadcast to indexing result of shape (3,)

pymc3_hmm/distributions.py:253: ValueError

Please provide any additional information below.

Versions and main components

brandonwillard commented 3 years ago

I was able to reproduce the error locally by rerunning the test numerous times with debug mode enabled. To rerun the test, I simply added the following annotation: @pytest.mark.parametrize("x", list(range(1000))).

Here are the relevant values:

>>> point
None
>>> size
1
>>> states
np.array([[0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0]], dtype=np.int32)
>>> samples
np.array([[0.e+000, 0.e+000, 5.e-324, 5.e-324, 0.e+000, 5.e-324, 0.e+000,
        0.e+000, 0.e+000, 0.e+000, 0.e+000]])
>>> i
1
>>> dist
<pymc3.distributions.discrete.Poisson at 0x7fe758572290>
>>> dist.mu
TensorConstant{100.0}
>>> i_idx
(np.array([0, 0, 0]), np.array([2, 3, 5]))
>>> i_size
3
>>> subset_args
[AdvancedSubtensor.0]
>>> theano.printing.debugprint(subset_args)
AdvancedSubtensor [id A] ''   
 |Elemwise{mul,no_inplace} [id B] ''   
 | |InplaceDimShuffle{x,x} [id C] ''   
 | | |TensorConstant{100.0} [id D]
 | |Alloc [id E] ''   
 |   |TensorConstant{1.0} [id F]
 |   |TensorConstant{1} [id G]
 |   |TensorConstant{11} [id H]
 |TensorConstant{(3,) of 0} [id I]
 |TensorConstant{[2 3 5]} [id J]
>>> subset_args[0].eval()
np.array([100., 100., 100.])

It looks like there's an inconsistency appearing here:

>>> dist.dist(subset_args[0].eval()).random()
np.array([ 94,  99, 107])
>>> dist.dist(*subset_args).random()
array([ 7,  3, 12, 18,  8, 14, 15, 12,  4])
brandonwillard commented 3 years ago

~The problem is caused by the _DrawValuesContext; it contains a value for the TensorVariable in subset_args[0], and that value has the wrong dimensions. This causes Distribution.random to use that incorrect value somewhere down the line.~

~It looks like we may need to use _DrawValuesContextBlocker to prevent _DrawValuesContext from picking up the sampled values of the subset_args entries.~

brandonwillard commented 3 years ago

This is caused by an issue in pymc3.memoize.memoize. See https://github.com/pymc-devs/pymc3/issues/4506.

fanshi118 commented 3 years ago

Reinstalled the pymc3 library with the most updated version in the GitHub repo. Ran test_DiscreteMarkovChain_random locally through 1000 runs (w/ randomized seed values) and didn't encounter this error. Looks like the issue has been fixed. How should we update the pymc3 version? By pointing directly to git+https://github.com/pymc-devs/pymc3?

brandonwillard commented 3 years ago

We'll have to wait for a new release, but we can close this issue.