atlab / cov-est

Covariance estimation library used in "Improved Estimation and Interpretation of Correlations in Neural Circuits."
MIT License
4 stars 4 forks source link

Questions about the code and functional connectivity in general #1

Open kauttoj opened 7 years ago

kauttoj commented 7 years ago

Firstly, thank you for this very useful code. I have few questions about the code and one more general functional connectivity question.

As we know, real neural data is noisy and we have no 'ground truths' when trying to estimate connectivity between neurons. With neuron-level data we cannot compare connections between animals (as people do in fMRI/MEG/EEG) and typically must work with one experimental dataset. The way I see it, only ways to convince ourselves that our connectivity estimations are valid: (1) use best available estimation method, which at the moment is your 'lv-glasso' (2) check the cross-validated correlation matrices and see if they agree over folds. Any real strong connections and graph metrics (e.g., major hubs) should be stable over folds. What is your opinion on this? Is there something else one can do to test if connections are real?

Here are some questions about the code:

(1) In your code, you treat each time bin independently (i.e., they are like conditions). Why so? At least for the dynamical stimuli (e.g., movies) with time-dependency, it seems more useful to compute variance over all bins to track timecourses. For example, if we consider two example neurons and their binned responses (e.g., ~200ms bins) for a single repeat of a movie clip:

neuron 1: [0,0,3,10,4,0,0,0,9,2,0,0] neuron 2: [0,0,2,15,6,1,0,0,5,4,0,0]

There is strong correlation between the two and - depending on deconvolution method - many of the bins are naturally empty. However, the downside is that if there are bins with bad data, one must drop all bins for that trial.

(2) In hyperparameter optimization, what is the rationale of using this particular loss function for correlation matrices: L = (trace(Rp/R)+cove.logDet(R))/p + sum(log(V(:)).*N(:)); Is this more stable than simply computing correlation or squared error between Rp and R, or similarity of the thresholded networks (e.g., top 5% of edges)?

(3) As I am only interested in pair-wise connections between neurons, hence it seems that with 'lv-glasso' I only need to concentrate on this matrix

     partial_corr_matrix = -corrcov(extras.S);

Is there any reason to instead use the actual (non-sparse) correlation matrix?

(4) Is the version in this repo the latest one? Have you made or considered any improvements/changes to the code or methods since the release?

kauttoj commented 7 years ago

Couple of more details/thoughts.

For question (2): I am mostly interested in the third term involving variance logarithm. First two terms appear to be standard parts of loss function (for e.g., scikit-learn's GraphLasso), although one typically uses covariance matrix instead of correlation.

For question (3): After analyzing more data, it seems that my sparse "extras.S" is often diagonal matrix without any off-diagonal elements. My optimal hyperparameters also tend to be very small (~10e-3 or ~10e-4). Is this typical or sign of bad data?

dimitri-yatsenko commented 7 years ago

We use the word "connectivity" rather vaguely. The temporal scales that we examined (100 ms bins) preclude the interpretation of the connectivity matrix as the matrix of direct monosynaptic connections. However, structured correlations are a prominent and stable feature of population activity and are used as one of its descriptive statistics. Many similar phenomena and measurements are used in neuroscience. For example, receptive fields are correlations between sensory features and neuronal activity and they have been interpreted to suggest particular patterns of synaptic connections but such simplified models are far from the complete explanation and the role of receptive fields in the overall process of perception does not immediately follow.

However, if we have multiple statistical descriptions of population activity, which one is better in some sense? My thought was as follows. If C is a statistic of population activity and A is some other completely independent description of the neuronal population (e.g. geometric positions, cell types, layers, response properties, or even synaptic connectivity), then the best statistic maximizes the mutual information I(C;A). In other words, it has the greatest capacity to differentiate other properties of the network that were not already taken into account when computing C. We showed that well estimated (through regularization) inverse covariance matrix we much better in this regard than the conventional correlation matrix. Some of these results were never published but in general, the new estimate could better categorize pairs of cells by distance, cell types, and sensory response properties.

There may be many proxies for I(C;A), which is difficult to estimate directly. In the paper we have a few examples even though we do not frame them in terms I(C;A), the significance of findings is expressed in these terms.

So this approach is quite different from stating that the best estimate is the most stable, i.e. yielding the same secondary statistics such as clusters across separate folds of the data.

dimitri-yatsenko commented 7 years ago

In our approach we considered noise correlation, i.e. correlated activity in repeated trials of the same stimuli after the average response has been subtracted and we did not model temporal dynamics of the population activity. I agree that modeling global dynamics in addition to estimating pairwise connectivity is a promising direction.

dimitri-yatsenko commented 7 years ago

For question 2, the loss function is a refactoring of the validation loss defined in Eq. 10 of the paper. It had to be refactored in order to assume the same correlation matrix in the individual bins of the response while allowing the variances to be individually conditioned on the timing from the stimulus onset. This is described in Eqs. 20-25.

dimitri-yatsenko commented 7 years ago

For question 2, the main motivation was not reach stability but to approach the ground truth (\Sigma), which is unknown. However, we show in Eqs 9,10, and 11 that minimizing the validation loss becomes equivalent to minimizing the distance to the ground truth with this loss function. This cannot be shown for most other loss functions.

dimitri-yatsenko commented 7 years ago

Question 3: Although we did motivation our method as the separation of joint population activity into latent units and direct interactions into sparse pairwise coupling terms, in most results we used the pairwise partial correlations combining the two effects. Pairwise partial correlations are still fully (linearly) conditioned on the rest of the network. The pairwise partial correlations were more consistent estimates of connectivity than just their sparse components.

dimitri-yatsenko commented 7 years ago

Question 4. This was the final version of the repo but you may find additional findings and information in my thesis repo https://github.com/dimitri-yatsenko/dy-thesis/blob/master/thesis.pdf, which I have now made public.