Closed githubpsyche closed 4 years ago
There are quite a few "options" for this one but the real ones (i.e the ones w/ substantial associated code) are compute_likelihood
, count_particles
, get_curve_xy_vals
.
Where are they used?
compute_likelihood
is called 3 times in the main function. count_particles
occurs in both analyze_outputs
and common_to_all_curves
(within auto_generate
, used for data simulation). get_curve_xy_vals
is used in common_to_all_curves
's draw_bcm_curve
option and in simulate_data
. draw_bcm_curve
doesn't seem to ever actually get used in the codebase, though.So as long as I validate both simulate_data
and importance_sampler
, I'll get family_of_curves
along the way.
Start time 6/22 21:26
********** START OF MESSAGES **********
0 trials are dropped since they are regarded as outliers
********** END OF MESSAGES **********
Betas: 0, 1
EM Iteration: 0
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-2bc4507d9022> in <module>
1 # run tests only when is main file!
2 if __name__ == '__main__':
----> 3 test_importance_sampler()
<ipython-input-37-060729857a3d> in test_importance_sampler()
18
19 # generate output
---> 20 importance_sampler(python_data, python_analysis_settings)
21 eng.importance_sampler(matlab_data, matlab_analysis_settings, nargout=0)
22
<ipython-input-36-ca37d7803efb> in importance_sampler(***failed resolving arguments***)
73 output_struct = family_of_curves(ana_opt['curve_type'], 'compute_likelihood', ana_opt['net_effect_clusters'], ana_opt['ptl_chunk_idx'][ptl_idx, 2],
74 param[int(ana_opt['ptl_chunk_idx'][ptl_idx, 0]):int(ana_opt['ptl_chunk_idx'][ptl_idx, 1]), :], hold_betas, preprocessed_data,
---> 75 ana_opt['distribution'], ana_opt['dist_specific_params'], ana_opt['data_matrix_columns'])
76
77 w[ana_opt['ptl_chunk_idx'][ptl_idx, 0]:ana_opt['ptl_chunk_idx'][ptl_idx, 1]] = output_struct['w'] # Gather weights
~\Documents\GitHub\PCITpy\pcitpy\family_of_curves.py in family_of_curves(curve_type, get_info, *varargin)
20 def family_of_curves(curve_type, get_info, *varargin):
21 if curve_type is 'horz_indpnt':
---> 22 return horz_indpnt_curve(get_info, varargin)
23 else:
24 raise ValueError('Invalid curve!')
~\Documents\GitHub\PCITpy\pcitpy\family_of_curves.py in horz_indpnt_curve(***failed resolving arguments***)
78 for i in range(len(net_effect_clusters)):
79 cluster_idx = np.where(data[:,net_effect_clusters_column == net_effect_clusters[i]])
---> 80 X = np.zeros((len(cluster_idx), particles))
81 for j in range(length(cluster_idx)):
82 if np.isnan(data[cluster_idx[j], predictor_var_column]):
TypeError: 'numpy.float64' object cannot be interpreted as an integer
particles
is probably a float.
compute_likelihood
is definitely plenty buggy but I have no easy way to test it - it requires too many inputs. Best I can do is write a function that does everything up to its first call in importance_sampler
. But don't stochastic things happen before then? Yup. Maybe I should focus on its use in another context? It's not used in any other context. Ugh.
Another other function options like this? Not in family_of_curves
. family_of_distributions
does have fminunc_bernoulli_both
requiring 4 params though. fminunc_normal_both
is similarly parametized. importance_sampler
itself also has compute_weights
.
I have to save some prospective parameters (preferably from that matlab rather than the python version). Then load them in the testing function to see if outputs for python and matlab versions are identical.
This might not work - python params are unlikely to be the same as matlab ones due in part to the indexing problem. Ughhh.
I have to generate python analogues where appropriate and compare output from that!
Everything that looks 1-indexed is in question.
I think we have a testing framework!
C:\ProgramData\Miniconda3\lib\site-packages\ipykernel_launcher.py:66: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-ac99c68fe86b> in <module>
1 # run tests only when is main file!
2 if __name__ == '__main__':
----> 3 python_output = test_compute_likelihood()
<ipython-input-2-10d9aaf953d6> in test_compute_likelihood()
36 ptl_chunk_idx[ptl_idx, 2], np.asarray(param)[int(ptl_chunk_idx[ptl_idx, 0]):int(ptl_chunk_idx[ptl_idx, 1]), :],
37 hold_betas, np.asarray(preprocessed_data), ana_opt['distribution'], ana_opt['dist_specific_params'],
---> 38 data_matrix_columns)
39
40 return python_output
<ipython-input-1-19dc1a138491> in family_of_curves(curve_type, get_info, *varargin)
4 def family_of_curves(curve_type, get_info, *varargin):
5 if curve_type == 'horz_indpnt':
----> 6 return horz_indpnt_curve(get_info, varargin)
7 else:
8 raise ValueError('Invalid curve!')
<ipython-input-1-19dc1a138491> in horz_indpnt_curve(***failed resolving arguments***)
68 else:
69 # If an activation is falling in the third segment of the curve then get the associated y val
---> 70 ix3 = data[cluster_idx[j], predictor_var_column] > x2
71 X[j, ix3] = (np.multiply(np.divide(y4[ix3]-y3[ix3], 1-x2[ix3]),data[cluster_idx[j], predictor_var_column]-1)) + y4[ix3]
72
ValueError: operands could not be broadcast together with shapes (0,) (50000,)
First bug.
if isnan(data(cluster_idx(j), predictor_var_column))
x(i, :) = 0;
I wonder if this should be zero here and not like -1. Or left as NaN
!
All done now except for the family_of_distributions call.
family_of_distributions(distribution, 'compute_densities', z, y, dist_specific_params)
Do I need to save variable values here, too? I already have matlab's w
- the output.
Let's make sure I have the inputs right. y
is still wrong. x
is good and so is z
probably. That leaves...y and w.
y
seems dealt with. Leaving w
! Do I need a separate test function for compute_densities
or since it's exclusively used in this context, can I test it here? This one test seems sufficient yeah.
There's still a rpoblem with how w
is generated. Seems to return a single value. I probably need to add an axis parameter somewhere.
The test is hard to finish but I'm at least confident now that the code works.
moving discussion to #30
Must develop and pass a test for each option.