Question about the MATLAB style correlation function in genes.py

https://github.com/alegiac95/Imaging-transcriptomics/blob/46adb1df85c37123c77226d023b6d763edae8aca/imaging_transcriptomics/genes.py#L116C1-L125C36

Hi, I am wondering if there is some thing wrong with the correlation here.

        def correlate(c1, c2):
            """Return the MATLAB style correlation between two vectors."""
            return np.corrcoef(np.hstack((c1, c2)), rowvar=False)[0, 1:]

        _res = pls_regression(gene_exp, imaging_data.reshape(
            imaging_data.shape[0], 1),
                              n_components=self.n_components,
                              n_boot=0, n_perm=0)
        r1 = correlate(_res.get("x_scores"), scan_data.reshape(
            scan_data.shape[0], 1))

In this code: "r1 = correlate(_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))". The _res.get("x_scores") is c1 which is the components from the PLS results, and the scan_data.reshape(scan_data.shape[0], 1)) is c2 which is a imaging data column. And then combined the component columns and the single imaging data column into a combined matrix using np.hstack.

So the last column in this combined matrix here is imaging data and the rest columns are components. And then using np.corrcoef with the rowvar=False we can get a correlation matrix with the element [m, n] represents the correlation coefficient between the column m and column n.

As far as I understand, the r1 represents to the correlation coefficients between the imaging data and different PLS components. However, here in the combined matrix, the imaging data is the last column. And when the code slicing the correlation matrix with the [0, :], it returns the correlation coefficients between the first component (the first column from c1) and the rest of the components as well as the imaging data.

For example, if there are 2 components, the r1 here is [coeffience between component 1 and 2, coeffience between component 1 and imaging data ].

Is that correct?

Hi, you are correct in how you interpret the results, however in the function i return only the elements from 1, so the correlation between component 1 and 2 is not returned

Hi, you are correct in how you interpret the results, however in the function i return only the elements from 1, so the correlation between component 1 and 2 is not returned

Thanks for you reply.

However, in the code:

np.corrcoef(np.hstack((c1, c2)), rowvar=False)

which is actually equivalent to:

np.corrcoef(np.hstack((_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))), rowvar=False)

Since scan_data.reshape(scan_data.shape[0], 1) is passed as the last element in line 124:

r1 = correlate(_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))

Let's assume that _res.get("x_scores") has two columns for two components PC_1 and PC_2. Then the combined matrix of np.hstack((_res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1))) will be

$$
\begin{bmatrix}
PC_1 & PC_2 & Scan 
\end{bmatrix}
$$

So the correlation matrix returned from np.corrcoef(np.hstack((c1, c2)), rowvar=False) will be (to simplify, I use | to implicate the correlation between two elements):

$$
\begin{bmatrix}
PC_1 | PC_1 & PC_1 | PC_2 & PC_1 | Scan \\
PC_2 | PC_1 & PC_2 | PC_2 & PC_2 | Scan \\
Scan | PC_1 & Scan | PC_2 & Scan | Scan
\end{bmatrix}
$$

And finally, when the function returns, slice the matrix using [0, 1:] to extract the elements in the first row after the second column. Which returns:

$$
r_1 = 
\begin{bmatrix}
PC_1 | PC_2 & PC_1 | Scan
\end{bmatrix}
$$

I am wondering whether the positions of _res.get("x_scores"), scan_data.reshape(scan_data.shape[0], 1) should be reversed to let scan data to be the first element. Like this:

r1 = correlate(scan_data.reshape(scan_data.shape[0], 1), _res.get("x_scores"))

==================================================== Here are some of the example: 191aab0b659f9159fefe246c7a178b3

alegiac95 / Imaging-transcriptomics

Question about the MATLAB style correlation function in genes.py #27