githubpsyche / pcitpy

The Probabilistic Curve Induction and Testing (P-CIT) toolbox, implemented in Python.
https://githubpsyche.github.io/pcitpy/
Apache License 2.0
0 stars 0 forks source link

importance_sampler testing #7

Closed githubpsyche closed 4 years ago

githubpsyche commented 4 years ago

Must develop and pass a test for each option.

githubpsyche commented 4 years ago
Start time 6/22 20:32
********** START OF MESSAGES **********
0 trials are dropped since they are regarded as outliers
********** END OF MESSAGES **********
Betas: 0, 1
EM Iteration: 0
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-2bc4507d9022> in <module>
      1 # run tests only when is main file!
      2 if __name__ == '__main__':
----> 3     test_importance_sampler()

<ipython-input-2-060729857a3d> in test_importance_sampler()
     18 
     19     # generate output
---> 20     importance_sampler(python_data, python_analysis_settings)
     21     eng.importance_sampler(matlab_data, matlab_analysis_settings, nargout=0)
     22 

<ipython-input-1-e544593ccecb> in importance_sampler(***failed resolving arguments***)
     71         #  This is where we use the chunking trick II
     72         for ptl_idx in range(np.shape(ana_opt['ptl_chunk_idx'])[0]):
---> 73             output_struct = family_of_curves(ana_opt['curve_type'], 'compute_likelihood', ana_opt['net_effect_clusters'], ana_opt['ptl_chunk_idx']['ptl_idx, 3'],
     74                                              param[ana_opt['ptl_chunk_idx'][ptl_idx, 1]:ana_opt['ptl_chunk_idx'][ptl_idx, 2], :], hold_betas, preprocessed_data,
     75                                              ana_opt['distribution'], ana_opt['dist_specific_params'], ana_opt['data_matrix_columns'])

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Was result of enclosing paired indices within quotations: ['ptl_idx, 3'].

githubpsyche commented 4 years ago
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-2bc4507d9022> in <module>
      1 # run tests only when is main file!
      2 if __name__ == '__main__':
----> 3     test_importance_sampler()

<ipython-input-7-060729857a3d> in test_importance_sampler()
     18 
     19     # generate output
---> 20     importance_sampler(python_data, python_analysis_settings)
     21     eng.importance_sampler(matlab_data, matlab_analysis_settings, nargout=0)
     22 

<ipython-input-6-f2cf72f6c3a8> in importance_sampler(***failed resolving arguments***)
     71         #  This is where we use the chunking trick II
     72         for ptl_idx in range(np.shape(ana_opt['ptl_chunk_idx'])[0]):
---> 73             output_struct = family_of_curves(ana_opt['curve_type'], 'compute_likelihood', ana_opt['net_effect_clusters'], ana_opt['ptl_chunk_idx'][ptl_idx, 3],
     74                                              param[ana_opt['ptl_chunk_idx'][ptl_idx, 1]:ana_opt['ptl_chunk_idx'][ptl_idx, 2], :], hold_betas, preprocessed_data,
     75                                              ana_opt['distribution'], ana_opt['dist_specific_params'], ana_opt['data_matrix_columns'])

IndexError: index 3 is out of bounds for axis 1 with size 3

ana_opt['ptl_chunk_idx'] has dimensions 2x3; I haven't adjusted indexing appropriately. The whole codebase could be a minefield!

githubpsyche commented 4 years ago
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-2bc4507d9022> in <module>
      1 # run tests only when is main file!
      2 if __name__ == '__main__':
----> 3     test_importance_sampler()

<ipython-input-17-060729857a3d> in test_importance_sampler()
     18 
     19     # generate output
---> 20     importance_sampler(python_data, python_analysis_settings)
     21     eng.importance_sampler(matlab_data, matlab_analysis_settings, nargout=0)
     22 

<ipython-input-16-dc0402fd2a11> in importance_sampler(***failed resolving arguments***)
     74         for ptl_idx in range(np.shape(ana_opt['ptl_chunk_idx'])[0]):
     75             output_struct = family_of_curves(ana_opt['curve_type'], 'compute_likelihood', ana_opt['net_effect_clusters'], ana_opt['ptl_chunk_idx'][ptl_idx, 2],
---> 76                                              param[ana_opt['ptl_chunk_idx'][ptl_idx, 0]:ana_opt['ptl_chunk_idx'][ptl_idx, 1], :], hold_betas, preprocessed_data,
     77                                              ana_opt['distribution'], ana_opt['dist_specific_params'], ana_opt['data_matrix_columns'])
     78 

TypeError: slice indices must be integers or None or have an __index__ method

Is definitely an issue with how I'm slicing param. Is it a numpy object? Yes. The trouble is that my indices are floats instead of integers; I can easily convert.

githubpsyche commented 4 years ago
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-45-2bc4507d9022> in <module>
      1 # run tests only when is main file!
      2 if __name__ == '__main__':
----> 3     test_importance_sampler()

<ipython-input-42-060729857a3d> in test_importance_sampler()
     18 
     19     # generate output
---> 20     importance_sampler(python_data, python_analysis_settings)
     21     eng.importance_sampler(matlab_data, matlab_analysis_settings, nargout=0)
     22 

<ipython-input-41-2046446b7791> in importance_sampler(***failed resolving arguments***)
     49         prev_iter_curve_param = np.full((ana_opt['particles'], family_of_curves(ana_opt['curve_type'], 'get_nParams')), np.nan) # Matrix to hold the previous iteration curve parameters
     50         w = np.full((ana_opt['particles']), np.nan) # Vector to hold normalized weights
---> 51         net_effects = np.full((len(ana_opt['net_effect_clusters']), ana_opt['particles']), np.nan) # Matrix to hold the predictor variables (taking net effects if relevant) over all particles
     52         dependent_var = [] # This vector cannot be initialized in advance since we don't know the length of this vector (dropping outliers)
     53 

~\Miniconda3\lib\site-packages\numpy\core\numeric.py in full(shape, fill_value, dtype, order)
    312     if dtype is None:
    313         dtype = array(fill_value).dtype
--> 314     a = empty(shape, dtype, order)
    315     multiarray.copyto(a, fill_value, casting='unsafe')
    316     return a

MemoryError: Unable to allocate 1.36 GiB for an array with shape (1824, 100000) and data type float64

What a pain. Can the shadow handle it?

githubpsyche commented 4 years ago

Shadow apparently can; this requires some documentation, though. First time concretely justifying having shadow!

githubpsyche commented 4 years ago

Start time 6/26 1:42
********** START OF MESSAGES **********
0 trials are dropped since they are regarded as outliers
********** END OF MESSAGES **********
Betas: 0, 1
EM Iteration: 0
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-51-2bc4507d9022> in <module>
      1 # run tests only when is main file!
      2 if __name__ == '__main__':
----> 3     test_importance_sampler()

<ipython-input-48-060729857a3d> in test_importance_sampler()
     18 
     19     # generate output
---> 20     importance_sampler(python_data, python_analysis_settings)
     21     eng.importance_sampler(matlab_data, matlab_analysis_settings, nargout=0)
     22 

<ipython-input-47-1fa42ebdda05> in importance_sampler(***failed resolving arguments***)
     97         optimizing_function = family_of_distributions(ana_opt['distribution'], 'fminunc_both_betas', w, net_effects, dependent_var, ana_opt['dist_specific_params'])
     98 
---> 99         result = optimize.minimize(optimizing_function, hold_betas, options={'disp':True})
    100         hold_betas = result.x
    101         fval = result.fun

~\AppData\Roaming\Python\Python37\site-packages\scipy\optimize\_minimize.py in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    602         return _minimize_cg(fun, x0, args, jac, callback, **options)
    603     elif meth == 'bfgs':
--> 604         return _minimize_bfgs(fun, x0, args, jac, callback, **options)
    605     elif meth == 'newton-cg':
    606         return _minimize_newtoncg(fun, x0, args, jac, hess, hessp, callback,

~\AppData\Roaming\Python\Python37\site-packages\scipy\optimize\optimize.py in _minimize_bfgs(fun, x0, args, jac, callback, gtol, norm, eps, maxiter, disp, return_all, **unknown_options)
   1007     else:
   1008         grad_calls, myfprime = wrap_function(fprime, args)
-> 1009     gfk = myfprime(x0)
   1010     k = 0
   1011     N = len(x0)

~\AppData\Roaming\Python\Python37\site-packages\scipy\optimize\optimize.py in function_wrapper(*wrapper_args)
    325     def function_wrapper(*wrapper_args):
    326         ncalls[0] += 1
--> 327         return function(*(wrapper_args + args))
    328 
    329     return ncalls, function_wrapper

~\AppData\Roaming\Python\Python37\site-packages\scipy\optimize\optimize.py in approx_fprime(xk, f, epsilon, *args)
    763 
    764     """
--> 765     return _approx_fprime_helper(xk, f, epsilon, args=args)
    766 
    767 

~\AppData\Roaming\Python\Python37\site-packages\scipy\optimize\optimize.py in _approx_fprime_helper(xk, f, epsilon, args, f0)
    695         ei[k] = 1.0
    696         d = epsilon * ei
--> 697         df = (f(*((xk + d,) + args)) - f0) / d[k]
    698         if not np.isscalar(df):
    699             try:

TypeError: unsupported operand type(s) for -: 'tuple' and 'tuple'

Incredibly complex error!

githubpsyche commented 4 years ago

First need to confirm that input functions are solid. Next must determine if optimization function is equivalent. Then the bug should be obvious.

githubpsyche commented 4 years ago

It does seem equivalent for this case. Apparently the default minimization function for fminunc is quasi-newton. There's no quas-newton option for the python optimize. It looks like it could be BFGS based on this page though.

This wikipedia article explicitly links quasi-newton with a few implementations:

the SR1 formula (for "symmetric rank-one"), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension L-BFGS.

I clearly want to use: minimize(method=’L-BFGS-B’). And that happens to be the default!

githubpsyche commented 4 years ago

It does seem that a gradient is specified - that's the second output of fminunc_bernoulli_both. It doesn't seem that the python version supports this kind of shared output thing. I need a separate function to generate the gradient.

githubpsyche commented 4 years ago

Going to close this and start fresh with #29