family_of_curves testing

githubpsyche commented 4 years ago

Must develop and pass a test for each option.

githubpsyche commented 4 years ago

There are quite a few "options" for this one but the real ones (i.e the ones w/ substantial associated code) are compute_likelihood, count_particles, get_curve_xy_vals.

Where are they used?

compute_likelihood is called 3 times in the main function.
count_particles occurs in both analyze_outputs and common_to_all_curves (within auto_generate, used for data simulation).
get_curve_xy_vals is used in common_to_all_curves's draw_bcm_curve option and in simulate_data. draw_bcm_curve doesn't seem to ever actually get used in the codebase, though.

So as long as I validate both simulate_data and importance_sampler, I'll get family_of_curves along the way.

githubpsyche commented 4 years ago

Start time 6/22 21:26
********** START OF MESSAGES **********
0 trials are dropped since they are regarded as outliers
********** END OF MESSAGES **********
Betas: 0, 1
EM Iteration: 0
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-2bc4507d9022> in <module>
      1 # run tests only when is main file!
      2 if __name__ == '__main__':
----> 3     test_importance_sampler()

<ipython-input-37-060729857a3d> in test_importance_sampler()
     18 
     19     # generate output
---> 20     importance_sampler(python_data, python_analysis_settings)
     21     eng.importance_sampler(matlab_data, matlab_analysis_settings, nargout=0)
     22 

<ipython-input-36-ca37d7803efb> in importance_sampler(***failed resolving arguments***)
     73             output_struct = family_of_curves(ana_opt['curve_type'], 'compute_likelihood', ana_opt['net_effect_clusters'], ana_opt['ptl_chunk_idx'][ptl_idx, 2],
     74                                              param[int(ana_opt['ptl_chunk_idx'][ptl_idx, 0]):int(ana_opt['ptl_chunk_idx'][ptl_idx, 1]), :], hold_betas, preprocessed_data,
---> 75                                              ana_opt['distribution'], ana_opt['dist_specific_params'], ana_opt['data_matrix_columns'])
     76 
     77             w[ana_opt['ptl_chunk_idx'][ptl_idx, 0]:ana_opt['ptl_chunk_idx'][ptl_idx, 1]] = output_struct['w'] # Gather weights

~\Documents\GitHub\PCITpy\pcitpy\family_of_curves.py in family_of_curves(curve_type, get_info, *varargin)
     20 def family_of_curves(curve_type, get_info, *varargin):
     21     if curve_type is 'horz_indpnt':
---> 22         return horz_indpnt_curve(get_info, varargin)
     23     else:
     24         raise ValueError('Invalid curve!')

~\Documents\GitHub\PCITpy\pcitpy\family_of_curves.py in horz_indpnt_curve(***failed resolving arguments***)
     78         for i in range(len(net_effect_clusters)):
     79             cluster_idx = np.where(data[:,net_effect_clusters_column == net_effect_clusters[i]])
---> 80             X = np.zeros((len(cluster_idx), particles))
     81             for j in range(length(cluster_idx)):
     82                 if np.isnan(data[cluster_idx[j], predictor_var_column]):

TypeError: 'numpy.float64' object cannot be interpreted as an integer

particles is probably a float.

githubpsyche commented 4 years ago

compute_likelihood is definitely plenty buggy but I have no easy way to test it - it requires too many inputs. Best I can do is write a function that does everything up to its first call in importance_sampler. But don't stochastic things happen before then? Yup. Maybe I should focus on its use in another context? It's not used in any other context. Ugh.

githubpsyche commented 4 years ago

Another other function options like this? Not in family_of_curves. family_of_distributions does have fminunc_bernoulli_both requiring 4 params though. fminunc_normal_both is similarly parametized. importance_sampler itself also has compute_weights.

githubpsyche commented 4 years ago

I have to save some prospective parameters (preferably from that matlab rather than the python version). Then load them in the testing function to see if outputs for python and matlab versions are identical.

githubpsyche commented 4 years ago

This might not work - python params are unlikely to be the same as matlab ones due in part to the indexing problem. Ughhh.

githubpsyche commented 4 years ago

I have to generate python analogues where appropriate and compare output from that!

githubpsyche commented 4 years ago

Everything that looks 1-indexed is in question.

githubpsyche commented 4 years ago

I think we have a testing framework!


C:\ProgramData\Miniconda3\lib\site-packages\ipykernel_launcher.py:66: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-ac99c68fe86b> in <module>
      1 # run tests only when is main file!
      2 if __name__ == '__main__':
----> 3     python_output = test_compute_likelihood()

<ipython-input-2-10d9aaf953d6> in test_compute_likelihood()
     36                         ptl_chunk_idx[ptl_idx, 2], np.asarray(param)[int(ptl_chunk_idx[ptl_idx, 0]):int(ptl_chunk_idx[ptl_idx, 1]), :],
     37                         hold_betas, np.asarray(preprocessed_data), ana_opt['distribution'], ana_opt['dist_specific_params'],
---> 38                         data_matrix_columns)
     39 
     40     return python_output

<ipython-input-1-19dc1a138491> in family_of_curves(curve_type, get_info, *varargin)
      4 def family_of_curves(curve_type, get_info, *varargin):
      5     if curve_type == 'horz_indpnt':
----> 6         return horz_indpnt_curve(get_info, varargin)
      7     else:
      8         raise ValueError('Invalid curve!')

<ipython-input-1-19dc1a138491> in horz_indpnt_curve(***failed resolving arguments***)
     68                 else:
     69                     # If an activation is falling in the third segment of the curve then get the associated y val
---> 70                     ix3 = data[cluster_idx[j], predictor_var_column] > x2
     71                     X[j, ix3] = (np.multiply(np.divide(y4[ix3]-y3[ix3], 1-x2[ix3]),data[cluster_idx[j], predictor_var_column]-1)) + y4[ix3]
     72 

ValueError: operands could not be broadcast together with shapes (0,) (50000,)

First bug.

githubpsyche commented 4 years ago

if isnan(data(cluster_idx(j), predictor_var_column))
    x(i, :) = 0;

I wonder if this should be zero here and not like -1. Or left as NaN!

githubpsyche commented 4 years ago

All done now except for the family_of_distributions call.

family_of_distributions(distribution, 'compute_densities', z, y, dist_specific_params)

Do I need to save variable values here, too? I already have matlab's w - the output.

githubpsyche commented 4 years ago

Let's make sure I have the inputs right. y is still wrong. x is good and so is z probably. That leaves...y and w.

githubpsyche commented 4 years ago

y seems dealt with. Leaving w! Do I need a separate test function for compute_densities or since it's exclusively used in this context, can I test it here? This one test seems sufficient yeah.

githubpsyche commented 4 years ago

There's still a rpoblem with how w is generated. Seems to return a single value. I probably need to add an axis parameter somewhere.

githubpsyche commented 4 years ago

The test is hard to finish but I'm at least confident now that the code works.

githubpsyche commented 4 years ago

moving discussion to #30

githubpsyche / pcitpy

family_of_curves testing #5