MangiolaLaboratory / sccomp

Testing differences in cell type proportions from single-cell data.
https://stemangiola.github.io/sccomp/
GNU General Public License v3.0
94 stars 7 forks source link

question: scope of the statistical model #113

Closed hansvancalster closed 9 months ago

hansvancalster commented 9 months ago

The paper that describes the method mentions single-cell transcriptomics, CyTOF, and microbiome sequencing as applications. I have read the paper and description of the statistical model and I am applying the model to environmental DNA (eDNA) data from, e.g., soil samples (e.g. taxonomic group Annelidae). IMO the statistical model can also be applied to such data, but maybe I overlooked something and would like to ask if you can confirm that the statistical model is suitable to analyse eDNA data. Data from eDNA is identical in nature to data from microbiome studies that use 16S rRNA amplicon sequencing.

stemangiola commented 9 months ago

Sure in principle to any count compositional data. if you would like to share the scatter plot from sccomp of the mean variability relationship, we get a better idea about the proprieties of the data.

hansvancalster commented 9 months ago

image

This is from a simplified model where I aggregated taxa to genus level resolution and only used the 9 most prevalent genera (which accounted for 85% of total number of reads). The code I used for estimation was (with latest development version):

s_annelida <- sccomp_estimate(
    .data = sccomp_data_annelida,
    formula_composition = ~
      landuse
    + soil_depth
    + landuse:soil_depth
    + (1 | PlotID),
    .sample = sample_id,
    .cell_group = taxon_id,
    .count = count,
    verbose = TRUE,
    max_sampling_iterations = 2000,
    bimodal_mean_variability_association = FALSE
  )
stemangiola commented 9 months ago

Looks like there might be an association. It seems the model could be beneficial. Also, please post the boxplots; these tell you if the model is descriptively adequate.

hansvancalster commented 9 months ago

For landuse:

image

For soil_depth:

image

Note that landuse and soil_depth interact in the model. See issue #102

stemangiola commented 9 months ago

It looks like the model describes your data well. I see some bimodality in t3 and t6 green. Make sure you know what those outliers are, and maybe you can model them. If you don't know where the bimodality comes from, it might be parimonious to use sccomp_remove_outliers().

stemangiola commented 9 months ago

You might also test differential variability. I see there might be some.