LieberInstitute / spatialDLPFC

spatialDLPFC project involving Visium (n = 30), Visium SPG (n = 4) and snRNA-seq (n = 19) samples
http://research.libd.org/spatialDLPFC/
19 stars 3 forks source link

Test cell type composition acrosss anterior, mid, posterior #131

Closed boyiguo1 closed 1 year ago

boyiguo1 commented 2 years ago

Null hypothesis: the cell type composition doesn't change among anterior, mid, posterior, adjusting for tissue layers

Implementation:

  1. For each sample $i$ and spot $j$, the cell composition of the spot after devonvolution is a vector $\boldsymbol y{ij} \equiv {\pi{ij1}, \dots, \pi_{ijk}} \in [0,1]^k$ where $k$ is the number of cell types (e.g. k = 13 for the layer-informed broad cell types).
  2. We first pseudo-bulk by the spatial domain (either the manual annotation, or data-driven $Sp_9D$)
    • I prefer manual annotation makes more sense because $Sp_9D$ contains layers with few spots
    • Now we have sample ($i$) and layer specific ($j$ with abuse of notation) vector \boldsymbol y{ij} \equiv {\pi{ij1}, \dots, \pi_{ijk}} \in [0,1]^k$
  3. We use log-ratio transformation, e.g. centered log-ratio transformation commonly used in microbiome data analysis, to convert $\boldsymbol y{ij} \in [0,1]^k$ to $\boldsymbol y{ij}^\prime \in R^{k-1}$
  4. We run a multivariate linear regression model to test if cell type composition changes across anterior, mid, posterior (denoted as $X_1$ and implemented wtih reference coding), adjusting for tissue layer a spot belongs to (denoted as $X_2$). We can use multivariate linear regression models to test if $X1$ is statistically significant regarding the outcome log-ratio transformed cell proportions $\boldsymbol y{ij}^\prime \in R^{k-1}$
    • We can either fit one multivariate normal for each spatial domain, and hence have K multivariate model For each spatial domain $j$, we fit a MVN model whose $E(\boldsymbol y_{ij}) = \boldsymbol \beta_0 + \boldsymbol \beta_1 \boldsymbol X_1$
    • We can also fit one multivariate normal model for all data by adding spatial domain, coronal section interaction $X_1 * X2 $ $E(\boldsymbol y{ij}) = \boldsymbol \beta_0 + \boldsymbol \beta_1 \boldsymbol X_1 + \boldsymbol \beta_2 \boldsymbol X_2 + \boldsymbol \beta_3 X_1 X_2$
    • We can also add a sample-specific random effect if necessary.
boyiguo1 commented 2 years ago

@lcolladotor This seems to be related to #126. Let me know if you think they are exactly the same or different so that we can merge the tasks or work on them together.

My intuition is the proposal above is a stat test, whereas #126 emphasizes more on the visualization/descriptive side more. So I want to see what you think about this.

boyiguo1 commented 2 years ago

@boyiguo1 If we decide to do this analysis, you can use the collapsed_layer data that nick created.

cell_group <- "layer"

collapsed_results_path_IF <- here(
    "processed-data", "spot_deconvo", "05-shared_utilities", "IF",
    paste0("results_collapsed_", cell_group, ".csv")
)

collapsed_results_path_nonIF <- here(
    "processed-data", "spot_deconvo", "05-shared_utilities", "nonIF",
    paste0("results_collapsed_", cell_group, ".csv")
)
lcolladotor commented 2 years ago

Hi @boyiguo1,

Overall, we think that it's challenging to draw strong conclusions across position (mid/ant/post) in terms of % of the layers represented, and thus also cell types. That's because the 6.5 x 6.5 mm Visium square is too small to capture the full mid/ant/post DLPFC. So we can't tell whether the differences in layer %s (and thus cell types) are due to biological changes across position or if it's a dissection variability issue.

Within a particular spatial domain, if there are cell type composition changes, that might be interesting. But it's likely a bit hard to say more than that.

Does this make sense?

best, Leo

boyiguo1 commented 2 years ago

@lcolladotor This is very helpful. Thank you so much!