ag-csw / LDStreamHMMLearn

1 stars 0 forks source link

Use extra trajectories as additional runs #28

Closed greenTara closed 7 years ago

greenTara commented 7 years ago

We are not using our simulated data to the fullest extent. For instance, when doing the eta heatmap we only use half of the trajectories, because num_trajectories is the mid value. We could repeat the calculation on the other half of the trajectories, and treat this as another run, with the same model. This should help our expectations converge, with little extra time required.

alexlafleur commented 7 years ago

We had the idea to simulate the complete data, and use slices of that data for each of the runs, but we did not finish that up so that the code for that is incomplete and throwing errors. What we have is this:

if j == 0:
      self.mm1_0_0 = self.mmf1_0.sample()[0]
      self.simulated_data = simulate_and_store(qmm1_0_0=self.mm1_0_0)
      simulated_data_ndarray = np.asarray(self.simulated_data)

num_trajs = np.shape(simulated_data_ndarray)[0]
simulated_data_slice = np.split(simulated_data_ndarray[j].flatten(), num_trajs ) # <--- Throws Error "array split does not result in equal division"

simulated_data_ndarray is of shape (16, ~4000). If the take simulated_data_ndarray[j] that would be a 1-d array of length ~4000 and by flattening that we get a 1-d array, too. So that makes no sense at that point.

We need a 2d-array with num_trajectories entries (because we generate the dataslice0 in error_bayes)


Another Idea:

We simulate with max_num_trajectories, but only use the mid_num_trajectories in the evaluation. To use the data at its fullest extent, we could do something like this:

if j == 0:
      self.mm1_0_0 = self.mmf1_0.sample()[0]
      self.simulated_data = simulate_and_store(qmm1_0_0=self.mm1_0_0)
      simulated_data_ndarray = np.asarray(self.simulated_data)

# since the mid_value is half of the max_value we can split the data in two slices
simulated_data_slice_two = np.split(simulated_data_ndarray.flatten(), 2)
for slice in simulated_data_slice_two:
     # and average the errors over these two sclices: calculate the errors with the first and second slice
     evaluate.test_taumeta_eta(mm1_0_0=self.mm1_0_0, simulated_data=slice) 
...
# average