don't recompute likelihoods when data division changes

mattjj commented 10 years ago

With a fixed library and a fixed overall set of data, we wastefully recompute iid likelihoods for each partitioning of data to engines (i.e. to add_data calls). We should fix that!

Unfortunately, we may have to do this outside add_data, since we need to know where the slices we're passing to add_data come from. So we need a global function in library_models and/or library_subhmm_models:

data_slices, aBls = split_data(bigdataarray,model,num_pieces)
...
for data, aBl in zip(data_slices,aBls):
    model.add_data_parallel(data=data,frozen_aBl=aBl,left_censoring=True)

or something like that...

alexbw commented 10 years ago

This approach works for library subHMMs, but not for library HSMMs.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-ccb7c38c9502> in <module>()
      6                                                    parallel_profile='lhsmm',
      7                                                    **params)
----> 8 lhsmm.fit(nonwhite_data)

/home/alexbw/Code/pyhsmm/mixins.py in fit(self, X, y)
    231         self._pre_fit()
    232         self._setup()
--> 233         self._data_setup()
    234         for itr in self.ixrange(self.n_iter):
    235             self._pre_resample(itr)

/home/alexbw/Code/pyhsmm/mixins.py in _data_setup(self)
    512     def _data_setup(self):
    513         if self.parallel:
--> 514             all_data, all_aBl = split_data(self.X, self, self.n_engines)
    515             if self.y == None:
    516                 split_labels = [None]*self.n_engines

/home/alexbw/Code/pyhsmm_library_models/util.pyc in split_data(big_data_array, model, num_parts)
     71     else:
     72         model.add_data(data=big_data_array)
---> 73         big_aBl_array = model.states_list.pop().aBls[0]
     74         with open(filepath,'w') as outfile:
     75             cPickle.dump(big_aBl_array,outfile,protocol=-1)

AttributeError: 'LibraryHSMMStatesIntegerNegativeBinomialVariant' object has no attribute 'aBls'

LibraryHSMMStatesIntegerNegativeBinomialVariant is a subclass of the LHSMM.

alexbw commented 10 years ago

Very small change in 41c4e6a67648b3d778857fe4d4a215dded09bec3 fixes it.

mattjj commented 10 years ago

Close again?

alexbw commented 10 years ago

Close

On Fri, Dec 20, 2013 at 2:39 PM, Matthew Johnson notifications@github.comwrote:

Close again?

— Reply to this email directly or view it on GitHubhttps://github.com/dattalab/pyhsmm-library-models/issues/51#issuecomment-31035773 .

dattalab / pyhsmm-library-models

don't recompute likelihoods when data division changes #51