PavlidisLab / Gemma

Genomics data re-analysis
Apache License 2.0
23 stars 6 forks source link

Other cell-level factors to consider and their implication on subsetting #1218

Closed arteymix closed 2 months ago

arteymix commented 2 months ago

It came to my attention from @rachadele that there are other cell-level factors that we can be accounting for beside cell type such as layer and brain region.

This means that in some cases, we might have to create subsets by combining independent cell type annotations. This would be fortunately supported by the ExpressionExperimentSubSet which can have more than one characteristic.

arteymix commented 2 months ago

At the last journal club, @neerapatadia covered a Perturb-Seq paper where cells were assigned specific perturbations (i.e. gene knockdowns) with CRISPRi. A dataset like this would have a treatment factor at the cell-level.

Reference: https://doi.org/10.1093/nar/gkae777

ppavlidis commented 2 months ago

At the last journal club, @neerapatadia covered a Perturb-Seq paper where cells were assigned specific perturbations (i.e. gene knockdowns) with CRISPRi. A dataset like this would have a treatment factor at the cell-level.

Reference: https://doi.org/10.1093/nar/gkae777

That data set (or perturb-seq in general) might not have replicates in the sense we would want to use, but I also don't think it poses any special data modeling problem, or at least it can be "made to fit". There are multiple cell lines, and multiple genetic manipulations. Cells would be grouped together and pseudobulked based on which manipulation they have (or no manipulation), for each cell line.

To the extent that each cell has a different "amount" of treatment (degree of knockdown of the targeted gene), and that "amount" is captured in the data for each cell, then yes, we would lose that information, and Gemma wouldn't be the right setting to examine that.

arteymix commented 2 months ago

To handle these in the future, we can introduce a CellLevelCharacteristics class and make CellTypeAssignment a subclass of that. The purpose would be to keep track of characteristics at the cell-level, similar to how we track sample-level characteristics in BioMaterials.

arteymix commented 2 months ago

I've added an interface for cell-level characteristics that is shared for cell type assignments and generic cell-level characteristics. It's possible to keep track of treatments, genotypes, layer, etc. at the cell-level.

Cell type is just a specific case, but it gets special treatment in the system.

arteymix commented 2 months ago

This is completed. For now, we can only split by cell type assignment, but that code could be generalized to any cell-level characteristics or combination thereof.