cozygene / bisque

An R toolkit for estimation of cell composition from bulk expression data
68 stars 20 forks source link

Error in Generating single-cell based reference #20

Open kanekalla opened 3 years ago

kanekalla commented 3 years ago

Hi @brandonjew ,

I am trying to use reference-based decomposition to deconvolute bulk samples. res <- BisqueRNA::ReferenceBasedDecomposition(bulk.eset, sc.eset, markers=NULL, use.overlap=FALSE)

I am getting the following error:

Decomposing into 7 cell types.
Using 8783 genes in both bulk and single-cell expression.
Converting single-cell counts to CPM and filtering zero variance genes.
Filtered 39 zero variance genes.
Converting bulk counts to CPM and filtering unexpressed genes.
Filtered 0 unexpressed genes.
Generating single-cell based reference from 5000 cells.

Error in sc.props[base::colnames(sc.ref), , drop = F] : 
  subscript out of bounds

Is it because of the cell type naming issues ? Checked the column names for sc reference input: 10X_P7_9_CACACAAAGTAGGTGC" "10X_P7_9_CACACAAAGTCCGGTC" "10X_P7_9_CACACAATCCGAGCCA" "10X_P7_9_CACACAATCTTTACGT" "10X_P7_9_CACACCTCAGGATCGA" "10X_P7_9_CACACCTGTCTAGAGG" "10X_P7_9_CACACCTTCCAATGGT" "10X_P7_9_CACACTCAGTCGATAA" "10X_P7_9_CACACTCCACCTCGTT"

Thank you for your help !!

brandonjew commented 3 years ago

Hi @kanekalla, thanks for your interest in our method! That could be the issue if you're using SeuratToExpressionSet() with _ as the delimiter. If all of your individuals begin with "10X" then it won't be able to differentiate them. One workaround would be to replace the third underscore with a dash and use that as the delimiter or to manually make the expression set.

kanekalla commented 3 years ago

Thank you @brandonjew , I fixed the row names. Now I am observing the following error. Error in BisqueRNA::ReferenceBasedDecomposition(bulk.eset, sc.eset, markers = NULL, : Zero genes left for decomposition. I tried to evaluate the output from GenerateSCReference(sc.eset, cell.types='cellType') it has all NA's in the resulting matrix. AT2 circMono club_nec_at1_unknwn DC_IM invading_monoc multiciliated_cells Xkr4 NA NA NA NA NA NA Rp1 NA NA NA NA NA NA Sox17 NA NA NA NA NA NA Mrpl15 NA NA NA NA NA NA Lypla1 NA NA NA NA NA NA Tcea1 NA NA NA NA NA NA Rgs20 NA NA NA NA NA NA Can you kindly look into the issue ?

brandonjew commented 3 years ago

Hi @kanekalla, what do the following commands return:

Thanks!

kanekalla commented 3 years ago

table(sc.eset[["SubjectName"]])

3-F-56 3-F-57 3-M-5/6 3-M-7/8 621 1520 784 2524

table(sc.eset[["cellType"]])

AT2 circMono club_nec_at1_unknwn DC_IM invading_monoc multiciliated_cells 89 220 45 87 161 49

sum(is.na(sc.eset[["SubjectName"]])) 0

sum(is.na(sc.eset[["cellType"]])) [1] 4798

head(sc.eset[["cellType"]],10) [1] DC_IM 6 Levels: AT2 circMono club_nec_at1_unknwn DC_IM ... multiciliated_cells

I guess the cellType is having NA's is that causing the problem ?

Thanks !!