bhmm / legacy-bhmm-force-spectroscopy-manuscript

Bayesian hidden Markov models for analysis of single-molecule trajectory data
GNU Lesser General Public License v3.0
2 stars 3 forks source link

Bugfixes after C implementation merge #15

Closed jchodera closed 9 years ago

jchodera commented 9 years ago

Working my way through issues with the build and packaging chain after merging C implementation from @franknoe.

jchodera commented 9 years ago

OK, the conda package builds now, but fails at the testing stage with doctest failures:

Doctest: bhmm.bhmm_class.BHMM ... FAIL
Doctest: bhmm.bhmm_class.BHMM._sampleHiddenStateTrajectory ... FAIL
Doctest: bhmm.bhmm_class.BHMM.sample ... FAIL

Starting to work through these...

franknoe commented 9 years ago

Sorry, I have run the tests package but not the doctests - these are probably out of date. Thanks for catching this.

Am 22/03/15 um 23:06 schrieb John Chodera:

OK, the conda package builds now, but fails at the testing stage with doctest failures:

Doctest: bhmm.bhmm_class.BHMM ... FAIL Doctest: bhmm.bhmm_class.BHMM._sampleHiddenStateTrajectory ... FAIL Doctest: bhmm.bhmm_class.BHMM.sample ... FAIL

Starting to work through these...

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84712694.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 9 years ago

This is why we use travis-ci to do automated testing of everything for each PR. :)

The best way to use this is to open a PR for any new addition and commit frequently after each logical change. Travis will automatically check your commits for you and tell you if there are errors.

Even better is to run (from a different directory):

nosetests bhmm --nocapture --verbosity=2 --with-doctest

before actually committing code that is broken!

Making some progress on doctests...

franknoe commented 9 years ago

OK, I was not aware of the doctest option, will use it in the future.

Thanks Frank

Am 22/03/15 um 23:10 schrieb John Chodera:

This is why we use travis-ci to do automated testing of everything for each PR. :)

The best way to use this is to open a PR for any new addition and commit frequently after each logical change. Travis will automatically check your commits for you and tell you if there are errors.

Even better is to run (from a different directory):

nosetests bhmm --nocapture --verbosity=2 --with-doctest

before actually committing code that is broken!

Making some progress on doctests...

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84713582.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

marscher commented 9 years ago

Am 22.03.2015 um 23:10 schrieb John Chodera:

This is why we use travis-ci to do automated testing of everything for each PR. :) We also do this on emma btw :)

The best way to use this is to open a PR for any new addition and commit frequently after each logical change. Travis will automatically check your commits for you and tell you if there are errors.

Even better is to run (from a different directory):

nosetests bhmm --nocapture --verbosity=2 --with-doctest

before actually committing code that is broken!

Making some progress on doctests...

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84713582.

jchodera commented 9 years ago

Some of these issues seem to be just forgetting to add verbose flags and are easy to clean up.

One question for @franknoe: What does this comment in bhmm/init/api.py mean?

    TODO
    ----
    * Replace this with EM or MLHMM procedure from Matlab code.
franknoe commented 9 years ago

No idea, this is your comment :)

Am 22/03/15 um 23:15 schrieb John Chodera:

Some of these issues seem to be just forgetting to add |verbose| flags and are easy to clean up.

One question for @franknoe https://github.com/franknoe: What does this comment in |bhmm/init/api.py| mean?

TODO
 * Replace this with EM or MLHMM procedure from Matlab code.

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84714850.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 9 years ago

No idea, this is your comment :)

Ah! This is probably out of date since you implemented this feature. :) Thanks!

jchodera commented 9 years ago

There's a more confusing issue with initial_model_gaussian1d:

----------------------------------------------------------------------
File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/bhmm_class.py", line 38, in bhmm.bhmm_class.BHMM
Failed example:
    bhmm = BHMM(data, nstates)
Exception raised:
    Traceback (most recent call last):
      File "/Users/choderaj/anaconda/lib/python2.7/doctest.py", line 1315, in __run
        compileflags, 1) in test.globs
      File "<doctest bhmm.bhmm_class.BHMM[5]>", line 1, in <module>
        bhmm = BHMM(data, nstates)
      File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/bhmm_class.py", line 101, in __init__
        self.model = self._generateInitialModel(output_model_type)
      File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/bhmm_class.py", line 307, in _generateInitialModel
        mlhmm = MLHMM(self.observations, self.nstates, reversible=self.reversible, output_model_type=output_model_type)
      File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/mlhmm.py", line 98, in __init__
        self.model = init.generate_initial_model(observations, nstates, output_model_type)
      File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/init/api.py", line 28, in generate_initial_model
        return bhmm.init.gaussian.initial_model_gaussian1d(observations, nstates, reversible=True, verbose=verbose)
      File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/init/gaussian.py", line 29, in initial_model_gaussian1d
        collected_observations = np.append(collected_observations, o_t, axis=0)
      File "/Users/choderaj/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3884, in append
        return concatenate((arr, values), axis=axis)
    ValueError: all the input arrays must have same number of dimensions
franknoe commented 9 years ago

I don't completely understand the logic of the hmm-object functions. It looks like

model.generate_synthetic_observation_trajectories(ntrajectories=10, length=10000)

does not only create an observation trajectory but rather a list of two lists containing hidden states and observations, of the overall shape

 (2, 10, 10000)

Instead of a list of 10 1D-arrays of length 10000 each. I suppose one should only pass one of these lists to the BHMM constructor.

Best Frank.

Am 22/03/15 um 23:38 schrieb John Chodera:

There's a more confusing issue with |initial_model_gaussian1d|:

---------------------------------------------------------------------- File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/bhmm_class.py", line 38, in bhmm.bhmm_class.BHMM Failed example: bhmm = BHMM(data, nstates) Exception raised: Traceback (most recent call last): File "/Users/choderaj/anaconda/lib/python2.7/doctest.py", line 1315, in run compileflags, 1) in test.globs File "<doctest bhmm.bhmm_class.BHMM[5]>", line 1, in bhmm = BHMM(data, nstates) File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/bhmm_class.py", line 101, in __init self.model = self._generateInitialModel(output_model_type) File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/bhmm_class.py", line 307, in _generateInitialModel mlhmm = MLHMM(self.observations, self.nstates, reversible=self.reversible, output_model_type=output_model_type) File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/mlhmm.py", line 98, in init self.model = init.generate_initial_model(observations, nstates, output_model_type) File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/init/api.py", line 28, in generate_initial_model return bhmm.init.gaussian.initial_model_gaussian1d(observations, nstates, reversible=True, verbose=verbose) File "/Users/choderaj/anaconda/lib/python2.7/site-packages/bhmm-0.1.0-py2.7-macosx-10.5-x86_64.egg/bhmm/init/gaussian.py", line 29, in initial_model_gaussian1d collected_observations = np.append(collected_observations, o_t, axis=0) File "/Users/choderaj/anaconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3884, in append return concatenate((arr, values), axis=axis) ValueError: all the input arrays must have same number of dimensions

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84721423.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 9 years ago

Thanks for catching this! That was indeed the bug---it returns both the synthetic observations and the hidden states, but I was only catching it as one argument. I should make returning the synthetic observations optional in another PR.

jchodera commented 9 years ago

One question, @franknoe: Looks like all of the nosetests were converted from methods to subclasses of unittest.TestCase, but I can't figure out why. Can you comment on why you did this?

franknoe commented 9 years ago

The main advantage is that you can use the setup-method to create objects that several tests will use (e.g. reference solutions). Useful for the more expensive tests. A minor advantage is that this way the tests also run when you just execute the file.

Did this create a problem?

Am 23/03/15 um 01:10 schrieb John Chodera:

One question, @franknoe https://github.com/franknoe: Looks like all of the nosetests were converted from methods to subclasses of |unittest.TestCase|, but I can't figure out why. Can you comment on why you did this?

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84735534.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 9 years ago

The main advantage is that you can use the setup-method to create objects that several tests will use (e.g. reference solutions). Useful for the more expensive tests. A minor advantage is that this way the tests also run when you just execute the file.

Thanks! I just wanted to understand what the advantage was---I'm happy to keep it like this!

jchodera commented 9 years ago

OK, I think I'm very close to getting the tests to pass again, with one exception. I can't get the discrete output model to work for a 4-state model with 2-symbol output. Not sure why this doesn't work, but it's something buried in the init module. I'm going to punt this to @franknoe, so I commented out the test.

franknoe commented 9 years ago

Sure, I'll handle it. Are you referring to a specific test?

Am 23/03/15 um 03:42 schrieb John Chodera:

OK, I think I'm very close to getting the tests to pass again, with one exception. I can't get the discrete output model to work for a 4-state model with 2-symbol output. Not sure why this doesn't work, but it's something buried in the |init| module. I'm going to punt this to @franknoe https://github.com/franknoe, so I commented out the test.

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84761625.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 9 years ago

It's the test_init.py test test_discrete_4_2 that I had trouble with. I rewrote it to test 4 states, 2 symbols (which I thought may have been the purpose), but couldn't get it to work.

Travis is failing on the plots---I'll fix that now.

franknoe commented 9 years ago

OK, will do it tomorrow

Am 23/03/15 um 04:10 schrieb John Chodera:

It's the |test_init.py| test |test_discrete_4_2| that I had trouble with. I rewrote it to test 4 states, 2 symbols (which I thought may have been the purpose), but couldn't get it to work.

Travis is failing on the plots---I'll fix that now.

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84771088.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany

jchodera commented 9 years ago

Hooray! All tests pass. Merging.

jchodera commented 9 years ago

Or, rather, you should feel free to review and comment and merge yourself before continuing to work on the code tomorrow!

franknoe commented 9 years ago

Hi,

I reverted that test and made it more robust. I actually had something else in mind, namely using an HMM to actually generate an approximation for the dominant kinetics for a Markov chain like in our JCP 2013 paper. However it made sense to do what you tried - just that this ran into another problem.

Merging this now...

Thanks for sorting my issues out. Next time will be smoother! Frank.

Am 23/03/15 um 04:10 schrieb John Chodera:

It's the |test_init.py| test |test_discrete_4_2| that I had trouble with. I rewrote it to test 4 states, 2 symbols (which I thought may have been the purpose), but couldn't get it to work.

Travis is failing on the plots---I'll fix that now.

— Reply to this email directly or view it on GitHub https://github.com/choderalab/bhmm/pull/15#issuecomment-84771088.


Prof. Dr. Frank Noe Head of Computational Molecular Biology group Freie Universitaet Berlin

Phone: (+49) (0)30 838 75354 Web: research.franknoe.de

Mail: Arnimallee 6, 14195 Berlin, Germany