Closed alexbw closed 10 years ago
The above error occurs during dv.wait, indicating the error is in the engines.
Also, I'm running in serial now. In serial, the memory creeps up slower, another indication that the memory allocation is happening in the engines. Here's the breakpoint for the serial mode —
^C^C^C^C---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
/home/alexbw/anaconda/lib/python2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where)
202 else:
203 filename = fname
--> 204 __builtin__.execfile(filename, *where)
/home/alexbw/Code/pyhsmm_library_models/real_data_plots/parallel-library-subhmms.py in <module>()
143 print "About to enter resample_model_parallel"
144 # model.resample_model_parallel()
--> 145 model.resample_model()
146 print "Resampled model, now getting likelihoods"
147 loglike = model.log_likelihood()/len(training_data)
/home/alexbw/Code/pyhsmm_library_models/pyhsmm/models.pyc in resample_model(self, **kwargs)
469 def resample_model(self,**kwargs):
470 self.resample_dur_distns()
--> 471 super(HSMM,self).resample_model(**kwargs)
472
473 def resample_dur_distns(self):
/home/alexbw/Code/pyhsmm_library_models/pyhsmm/models.pyc in resample_model(self, temp)
120 self.resample_trans_distn()
121 self.resample_init_state_distn()
--> 122 self.resample_states(temp=temp)
123
124 def resample_obs_distns(self):
/home/alexbw/Code/pyhsmm_library_models/pyhsmm/models.pyc in resample_states(self, temp)
138 def resample_states(self,temp=None):
139 for s in self.states_list:
--> 140 s.resample(temp=temp)
141
142 def copy_sample(self):
/home/alexbw/Code/pyhsmm_library_models/pyhsmm/internals/states.pyc in resample(self, temp)
1344 # TODO something with temperature
1345 self._remove_substates_from_subHMMs()
-> 1346 alphan = self.messages_forwards_normalized()
1347 self.sample_backwards_normalized(alphan)
1348
/home/alexbw/Code/pyhsmm_library_models/pyhsmm/internals/states.pyc in messages_forwards_normalized(self)
1324 self.rs,self.ps,
1325 self.subhmm_trans_matrices,self.subhmm_pi_0s,
-> 1326 self.aBls,self._alphan)
1327
1328 return self._alphan
KeyboardInterrupt:
Evidence points to https://github.com/dattalab/pyhsmm/blob/subhmms/internals/subhmm_messages.cpp#L216
I added logs around all Python, Cython and C++ code. The log statement before that line is what hangs. Now double-checking the inner loop, and letting things run out a bit farther.
We got through this, it's about as memory-lean as it can be.
After investigating, everything's working properly, these models are just huge. Closing.
Control-c'd in the parallel code. Happening within the state resampling. Going to break inside of the serial version now. Problem is, if I don't catch the ctrl-c fast enough, the machine locks up.