dviraran / SingleR

SingleR: Single-cell RNA-seq cell types Recognition (legacy version)
GNU General Public License v3.0
266 stars 98 forks source link

Remove certain celltype from reference data #40

Open ysbioinfo opened 5 years ago

ysbioinfo commented 5 years ago

Hi, Thanks for developing such wonderful tool. I want to use singleR to identify cell types in my 10X data. The result shows there is a cluster of "Monocyte" in my data. However, there should not be monocyte because our cell are from liver tissue. I think these monocyte should be macrophages. So I want to know if I can exclude Monocyte from the ENCODE/Blueprint dataset, thus these cells will be annotated to macropahge.

Meanwhile, as singleR will assign each cell to a celltype finally, but there maybe some doublets in 10X data, for example, some cells express the markers for T cell and macrophage simultaneously. Could you give me some advice on how to remove these cells using singleR result? Thanks a lot! Yang

dviraran commented 5 years ago

Thanks for the kind words.

Regarding your questions -

  1. It is interesting that SingleR ' preferred' monocytes over macrophages for those cells. Differentiating macrophages and dendritic cells may sometimes be non-trivial, but I've never had issues in differentiating monocytes and macrophages.
  2. It is always good to look at the heatmap (SingleR.DrawHeatmap function) of scores, and not just the final labels, to see what was SingleR wandering about. This could give you better hints of whether there are multiple populations there, a gradient of differentiation, or if this cell type is missing from the reference, and the cells are similar to several different cell types (see some examples here).
  3. To remove a cell type from the reference, just copy the reference (in your case its the object blueprint_endcode), find the cells you want to remove (in your case (grepl('Monocytes',blueprint_endcode$main_types) and remove those columns from the expression matrix (blueprint_endcode$data, types and main_types). Finally, recreate the variable genes set using the CreateVariableGeneSet function. You can see some details here on creating a new reference dataset.
  4. There are tools for removing doublets (DoubletFinder, DoubletDecon and others). Using SingleR you can again use the heatmap to identify cells that have high similarity to two distinct cell types. Hope this helps.

Best, Dvir

eMedData commented 3 years ago

Thanks for the kind words.

Regarding your questions -

  1. It is interesting that SingleR ' preferred' monocytes over macrophages for those cells. Differentiating macrophages and dendritic cells may sometimes be non-trivial, but I've never had issues in differentiating monocytes and macrophages.
  2. It is always good to look at the heatmap (SingleR.DrawHeatmap function) of scores, and not just the final labels, to see what was SingleR wandering about. This could give you better hints of whether there are multiple populations there, a gradient of differentiation, or if this cell type is missing from the reference, and the cells are similar to several different cell types (see some examples here).
  3. To remove a cell type from the reference, just copy the reference (in your case its the object blueprint_endcode), find the cells you want to remove (in your case (grepl('Monocytes',blueprint_endcode$main_types) and remove those columns from the expression matrix (blueprint_endcode$data, types and main_types). Finally, recreate the variable genes set using the CreateVariableGeneSet function. You can see some details here on creating a new reference dataset.
  4. There are tools for removing doublets (DoubletFinder, DoubletDecon and others). Using SingleR you can again use the heatmap to identify cells that have high similarity to two distinct cell types. Hope this helps.

Best, Dvir

Hi Dvir, thank you for your reply. I was wondering if the example you suggested, with respect to removing cell types from the reference, could be done in an R script? I downloaded the HumanPrimaryCellAtlasData() set using the celldex package and I want to remove the 'Astrocyte' cell type.