flatironinstitute / nemos

NEural MOdelS, a statistical modeling framework for neuroscience.
https://nemos.readthedocs.io/
MIT License
82 stars 8 forks source link

GridsearchCV and pipeline: input dimensionality #263

Open FlyingFordAnglia opened 1 week ago

FlyingFordAnglia commented 1 week ago

Hi! I am trying to fit a glm to some spiking data from a bunch of neurons. My design matrix is the binned spike counts of all neurons, and my 'y' is the spike counts of the neuron I am interested in. Before fitting the glm, I wanted to run a grid search for hyperparameter tuning. When I run the attached code, I get the following error:

TypeError: Input dimensionality mismatch. This basis evaluation requires 1 inputs, 15 inputs provided instead.

From what I can gather, it appears that the fit_transform method that gridsearchCV uses internally expects a design matrix with a single column, not a matrix of n_samples, n_features. How can I get this to work?

            # region Hyperparameter tuning
            num_bases = 10
            print(f'Number of basis functions: {num_bases}')
            basis = nemos.basis.RaisedCosineBasisLinear(n_basis_funcs=num_bases, mode="conv", window_size=filter_size)
            transformer_basis = basis.to_transformer()
            neuron = 15
            print(f'{neuron} Neurons considered = {neurons_slice[0:neuron]}')
            spike_counts = spike_dat[:][neurons_slice[0:neuron], :time_vec_cut_index].T
            train_spike_counts = spike_counts[0:int(len(spike_counts) * 0.7), :]
            pipeline = Pipeline(
                [
                    (
                        "transformerbasis",
                        transformer_basis,
                    ),
                    (
                        "glm",
                        nemos.glm.GLM(regularizer_strength=0.5, regularizer="Ridge", solver_kwargs={'verbose': True}),
                    ),
                ]
            )
            param_grid = dict(
                glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
                transformerbasis__n_basis_funcs=(5, 10, 15, 20),
            )
            gridsearch = GridSearchCV(
                pipeline,
                param_grid=param_grid,
                cv=2
            )
            gridsearch.fit(train_spike_counts, train_spike_counts[:, glm_neuron_id].flatten())
            cvdf = pd.DataFrame(gridsearch.cv_results_)

            cvdf_wide = cvdf.pivot(
                index="param_transformerbasis__n_basis_funcs",
                columns="param_glm__regularizer_strength",
                values="mean_test_score",
            )
            plot_heatmap_cv_results(cvdf_wide)
            # best_params = hyper_param_tuning()
            sys.exit()
            # endregion

My installed nemos version is 0.1.6 and sklearn version is 1.5.0

A tangential question: How do I integrate batch gradient descent with this pipeline?

Any help would be appreciated, thanks!

sjvenditto commented 1 week ago

Currently, basis objects assume a single input, and addressing this issue is a work-in-progress. The current work-around is to define a basis in param_grid that matches the dimensionality of the input; in your case, this will be an additive basis with the number of components (RaisedCosineBasisLinear bases) equal to the number of neurons. This will look like:

param_grid = dict(
                glm__regularizer_strength=(0.1, 0.01, 0.001, 1e-5),
                transformerbasis__basis=[basis*neuron],
            )

where basis*neuron is shorthand for adding basis together neuron times. Unfortunately, this solution will raise another error in both main and dev branches having to do with transformer basis property names (as well as the shorthand not existing). This issue is being fixed in PR #235, but you can try it out in the meantime by using the fix_transformer branch in nemos if you've installed it from source. Let me know if this works for you!