dmcable / spacexr

Spatial-eXpression-R: Cell type identification (including cell type mixtures) and cell type-specific differential expression for spatial transcriptomics
GNU General Public License v3.0
294 stars 72 forks source link

Error when computing differentially expressed genes to deconvolute bulk samples #18

Closed saguilarfer closed 2 years ago

saguilarfer commented 3 years ago

Dear RCTD developers,

I am currently trying to use your package to deconvolute bulk data from TCGA (mixtures) using a reference data obtained at https://www.biorxiv.org/content/10.1101/2020.10.26.354829v1).

Therefore, I have two initial objects: A. seurat_reference (seurat object with the reference count matrix and the cell types in the variable newcelltypes:

> seurat_reference
An object of class Seurat 
87659 features across 10063 samples within 1 assay 
Active assay: RNA (87659 features, 0 variable features)
> seurat_reference@assays$RNA@counts[1:5,1:5]
5 x 5 sparse Matrix of class "dgCMatrix"
       WMC0107359_10 WMC0036606_10 WMC294020_10 WMC0038463_10 WMC287808_10
TSPAN6             .             .            .             .            .
DPM1               1             .            .             1            .
SCYL3              .             .            .             .            .
FGR                .             .            .             .            .
CFH                .             .            .             .            .  

B. mixtures (count matrix of TCGA samples):

> dim(mixtures)
[1] 55309   594

> mixtures[1:5,1:5]
         TCGA.91.6840.01A.11R.1949.07 TCGA.55.6986.01A.11R.1949.07
TSPAN6                           4836                         3432
TNMD                                3                            6
DPM1                             1409                          977
SCYL3                             546                          501
C1orf112                          437                          125
         TCGA.05.4395.01A.01R.1206.07 TCGA.44.7672.01A.11R.2066.07
TSPAN6                           3136                         4259
TNMD                                0                           16
DPM1                             3206                         1832
SCYL3                            1544                          810
C1orf112                          521                          449
         TCGA.44.2662.01B.02R.A277.07
TSPAN6                            668
TNMD                               10
DPM1                              556
SCYL3                             809
C1orf112                          635

whose column names have been rename to: spot_1, spot_2... to facilitate the understanding.

>  colnames(mixtures) <-  paste("spot",1:594, sep = "_")
>  mixtures[1:5,1:5]
         spot_1 spot_2 spot_3 spot_4 spot_5
TSPAN6     4836   3432   3136   4259    668
TNMD          3      6      0     16     10
DPM1       1409    977   3206   1832    556
SCYL3       546    501   1544    810    809
C1orf112    437    125    521    449    635

1. I run RCTD_structure to obtain reference files:

dir.create("{path}/RCTD_data/reference"

sc_ls <- RCTD_structure(sc_obj = seurat_reference,
               clust_vr = "new_cell_types")

readr::write_csv(x = sc_ls[[1]],
                 path = "{path}/RCTD_data/reference/meta_data.csv")

readr::write_csv(x = sc_ls[[2]],
                 path = "{path}/RCTD_data/reference/cell_type_dict.csv")

sc_ls[[3]] %>%
  data.frame() %>%
  tibble::rownames_to_column("gene") %>%
  readr::write_csv(x = ., 
                   path = "{path}/RCTD_data/reference/dge.csv",
                   col_names = TRUE)

2. I obtain the spatial files

dir.create"{path}/RCTD_data/spatial"

tmp <- se_mixtures_external_gene_name %>%
  data.frame() %>% 
  tibble::rownames_to_column("gene") %>% 
  dplyr::mutate(spot_0 = spot_1) %>% # as exposed in a previous thread to obtain the first column 
  dplyr::select(gene, spot_0, everything()) %>% 
  readr::write_csv(x = ,
                   file = "{path}/RCTD_data/spatial/MappedDGEForR.csv",
                   col_names = TRUE)

## Then  I create aritificial coordinates for the synthetic spots
> ncol(mixtures)
[1] 594

## Since we have 594 spots (samples) we can create an array of 20 * 30 matrix`
coord <- expand.grid(1:20, 1:30)
colnames(coord) <- c("xcoord", "ycoord")
df_coord <- data.frame("barcodes" = paste("spot",1:600, sep = "_"), coord)

df_coord <- df_coord[1:594,]

readr::write_csv(x = df_coord,
                 file = "{path}/RCTD_data/spatial/BeadLocationsForR.csv")

3. I obtain puck and reference

reference <- RCTD::dgeToSeurat(refdir = {path}/RCTD_data/reference")
puck <- RCTD::read.SpatialRNA(datadir = {path}/RCTD_data/spatial")

4. I create RCTD

myRCTD <- RCTD::create.RCTD(spatialRNA = puck,
                            reference = reference,
                            max_cores = 8, 
                            test_mode = FALSE)

However here I obtain the following error:

> myRCTD <- RCTD::create.RCTD(spatialRNA = puck,
+                             reference = reference,
+                             max_cores = 8,
+                             test_mode = FALSE)
[1] "Begin: process_cell_type_info"
[1] "init_RCTD: number of cells in reference: 10063"
[1] "init_RCTD: number of genes in reference: 87659"

                             B cell                                 cDC 
                                500                                 225 
         Central memory CD4 T cells               Cytotoxic CD8 T cells 
                                446                                 500 
        Effector memory CD4 T cells         Effector memory CD8 T cells 
                                500                                 189 
     Effector precursor CD4 T cells                             M2 TAMs 
                                500                                 500 
                               MAST                                 mDC 
                                203                                 500 
                          Monocytes                       Naive T cells 
                                500                                 500 
                                 NK                                 pDC 
                                500                                 500 
                           Plasma B           Pre-exhausted CD8 T cells 
                                500                                 500 
Proinflamatory TAMs and neutrophils                  Regulatory T cells 
                                500                                 500 
                          SPP1 TAMs    Terminally exhausted CD8 T cells 
                                500                                 500 
                         Th17 cells                      T helper cells 
                                500                                 500 
[1] "End: process_cell_type_info"
[1] "create.RCTD: getting regression differentially expressed genes: "
[1] "get_de_genes: B cell found DE genes: 0"
[1] "get_de_genes: cDC found DE genes: 0"
[1] "get_de_genes: Central memory CD4 T cells found DE genes: 0"
[1] "get_de_genes: Cytotoxic CD8 T cells found DE genes: 0"
[1] "get_de_genes: Effector memory CD4 T cells found DE genes: 0"
[1] "get_de_genes: Effector memory CD8 T cells found DE genes: 0"
[1] "get_de_genes: Effector precursor CD4 T cells found DE genes: 0"
[1] "get_de_genes: M2 TAMs found DE genes: 0"
[1] "get_de_genes: MAST found DE genes: 0"
[1] "get_de_genes: mDC found DE genes: 0"
[1] "get_de_genes: Monocytes found DE genes: 0"
[1] "get_de_genes: Naive T cells found DE genes: 0"
[1] "get_de_genes: NK found DE genes: 0"
[1] "get_de_genes: pDC found DE genes: 0"
[1] "get_de_genes: Plasma B found DE genes: 0"
[1] "get_de_genes: Pre-exhausted CD8 T cells found DE genes: 0"
[1] "get_de_genes: Proinflamatory TAMs and neutrophils found DE genes: 0"
[1] "get_de_genes: Regulatory T cells found DE genes: 0"
[1] "get_de_genes: SPP1 TAMs found DE genes: 0"
[1] "get_de_genes: Terminally exhausted CD8 T cells found DE genes: 0"
[1] "get_de_genes: Th17 cells found DE genes: 0"
[1] "get_de_genes: T helper cells found DE genes: 0"
[1] "get_de_genes: total DE genes: 0"
Error in RCTD::create.RCTD(spatialRNA = puck, reference = reference,  : 
  create.RCTD: Error: 0 regression differentially expressed genes found

Therefore, I am asking:

In case you need the data, I could send them if you provide me an e-mail address.

Thank you very much in advance,

Regards,

Sergio Aguilar.

dmcable commented 3 years ago

Dear Sergio,

You are correct that RCTD can be used to run on bulk samples. I have to publish the function to be able to do that -- let me get back to you in a little bit.

As far as the DE genes, yes it does seem like a bug that 0 genes were detected. For setting up the RCTD object, we have created a new version of RCTD that should resolve most of the issues. Can you please update to the newest version and try creating the SpatialRNA and Reference objects? In this new version, you do not need to load from files, but rather pass in matrices. The new constructor functions should be able to detect any discrepancies that would later cause errors in being unable to detect DE genes. Hope this helps.

Dylan

dmcable commented 2 years ago

Closing this issue for now

yangjun9095 commented 1 year ago

@dmcable Hi! I'd like to re-open this issue as I'm having a similar problem in computing DE genes. So, I'm trying to run RCTD for a slide-seq dataset, using a scRNA-seq reference (a Zebrafish atlas:https://zebrahub.ds.czbiohub.org, specifically, 1dpf datsaet). However, I got the following error when I ran "myRCTD <- create.RCTD(puck, reference, max_cores = 10)" `Begin: process_cell_type_info process_cell_type_info: number of cells in reference: 12914 process_cell_type_info: number of genes in reference: 32060

           blood island           blood vasculature         cardiac muscle cell 
                    233                         359                         162 
           diencephalon                    endoderm              epidermal cell 
                   1173                         210                         431 
                    fin     floor plate neural tube                         gut 
                    617                          42                         114 
    hatching gland cell             head mesenchyme        hematopoietic system 
                     81                         344                         946 
              hindbrain       lateral line ganglion midbrain hindbrain boundary 
                    493                         194                         811 
         muscle pioneer                     myotome                neural crest 
                    572                         771                         279 
                 neuron                   notochord                   optic cup 
                    810                          46                         929 
           otic vesicle                    periderm         pharyngeal arch 3-7 
                    333                         115                         282 
           pigment cell        primitive heart tube          solid lens vesicle 
                     87                         381                          91 
                 somite                 spinal cord                    tail bud 
                    806                         439                         213 
          telencephalon         trigeminal ganglion 
                    475                          75 

End: process_cell_type_info create.RCTD: getting regression differentially expressed genes: get_de_genes: blood island found DE genes: 0 get_de_genes: blood vasculature found DE genes: 0 get_de_genes: cardiac muscle cell found DE genes: 1 get_de_genes: diencephalon found DE genes: 0 get_de_genes: endoderm found DE genes: 0 get_de_genes: epidermal cell found DE genes: 0 get_de_genes: fin found DE genes: 0 get_de_genes: floor plate neural tube found DE genes: 0 get_de_genes: gut found DE genes: 0 get_de_genes: hatching gland cell found DE genes: 0 get_de_genes: head mesenchyme found DE genes: 0 get_de_genes: hematopoietic system found DE genes: 0 get_de_genes: hindbrain found DE genes: 0 get_de_genes: lateral line ganglion found DE genes: 1 get_de_genes: midbrain hindbrain boundary found DE genes: 0 get_de_genes: muscle pioneer found DE genes: 1 get_de_genes: myotome found DE genes: 0 get_de_genes: neural crest found DE genes: 0 get_de_genes: neuron found DE genes: 0 get_de_genes: notochord found DE genes: 0 get_de_genes: optic cup found DE genes: 0 get_de_genes: otic vesicle found DE genes: 0 get_de_genes: periderm found DE genes: 0 get_de_genes: pharyngeal arch 3-7 found DE genes: 0 get_de_genes: pigment cell found DE genes: 0 get_de_genes: primitive heart tube found DE genes: 0 get_de_genes: solid lens vesicle found DE genes: 0 get_de_genes: somite found DE genes: 0 get_de_genes: spinal cord found DE genes: 0 get_de_genes: tail bud found DE genes: 0 get_de_genes: telencephalon found DE genes: 0 get_de_genes: trigeminal ganglion found DE genes: 0 get_de_genes: total DE genes: 1 create.RCTD: getting platform effect normalization differentially expressed genes: get_de_genes: blood island found DE genes: 0 get_de_genes: blood vasculature found DE genes: 0 get_de_genes: cardiac muscle cell found DE genes: 1 get_de_genes: diencephalon found DE genes: 0 get_de_genes: endoderm found DE genes: 0 get_de_genes: epidermal cell found DE genes: 0 get_de_genes: fin found DE genes: 0 get_de_genes: floor plate neural tube found DE genes: 0 get_de_genes: gut found DE genes: 0 get_de_genes: hatching gland cell found DE genes: 0 get_de_genes: head mesenchyme found DE genes: 0 get_de_genes: hematopoietic system found DE genes: 0 get_de_genes: hindbrain found DE genes: 0 get_de_genes: lateral line ganglion found DE genes: 1 get_de_genes: midbrain hindbrain boundary found DE genes: 0 get_de_genes: muscle pioneer found DE genes: 1 get_de_genes: myotome found DE genes: 1 get_de_genes: neural crest found DE genes: 0 get_de_genes: neuron found DE genes: 0 get_de_genes: notochord found DE genes: 0 get_de_genes: optic cup found DE genes: 0 get_de_genes: otic vesicle found DE genes: 0 get_de_genes: periderm found DE genes: 0 get_de_genes: pharyngeal arch 3-7 found DE genes: 0 get_de_genes: pigment cell found DE genes: 0 get_de_genes: primitive heart tube found DE genes: 0 get_de_genes: solid lens vesicle found DE genes: 1 get_de_genes: somite found DE genes: 0 get_de_genes: spinal cord found DE genes: 0 get_de_genes: tail bud found DE genes: 0 get_de_genes: telencephalon found DE genes: 0 get_de_genes: trigeminal ganglion found DE genes: 0 get_de_genes: total DE genes: 1 Error in (function (cl, name, valueClass) : assignment of an object of class “numeric” is not valid for @‘counts’ in an object of class “SpatialRNA”; is(value, "dgCMatrix") is not TRUE`

I wonder whether it's because the cell-type annotation was too fine, such that the DE gene detection is not working, or if there's a bug/error in what I'm trying to do. Any help would be appreciated. Thanks!

dmcable commented 1 year ago

Sure, please post / email me your objects.

yangjun9095 commented 1 year ago

@dmcable Hi! Thanks for your quick reply! I sent you the link to my datasets via your email, but I'm putting the link here for just in case. Thank you again for your help! link - https://drive.google.com/drive/folders/10x0OOkuW3duXNezt9IxABIy1CDSwf-4_?usp=sharing Note that the puck/reference folders have csv files exported from h5ad/anndata, and I attached a R markdown script I used to run RCTD. Please email me know if you have any questions! (yang-joon.kim@czbiohub.org)

Best, Yang-Joon

dmcable commented 1 year ago

Hi, thanks, I receive your email and the dataset. I've added a new error message to this function so that next time it will catch and explain the error. In your case, this is caused by a mismatch (upper vs. lower case) of the gene names between Reference and SpatialRNA.

Best, Dylan