Danko-Lab / TED

a fully Bayesian approach to deconvolve tumor microenvironment
60 stars 10 forks source link

run TED on non-tumor data #10

Closed RK1912 closed 3 years ago

RK1912 commented 3 years ago

Hi I recently came across TED and I am trying to use it for some synovial fluid bulk RNA-seq data but I have been getting the following error when I use run.ted()

Number of outlier genes filtered= 0
[1] "aligning reference and mixture..."
[1] "No tumor reference is speficied. Reference profiles are treated equally."
[1] "run first sampling"
current sample ID:1  2  3  4  5  6  7  8  9  10
[1] "merge subtypes"
        <NA>
Min.      NA
1st Qu.   NA
Median    NA
Mean     NaN
3rd Qu.   NA
Max.      NA
NA's      10
[1] "pooling information across samples"
Error: $ operator is invalid for atomic vectors
In addition: Warning message:
In mclapply(1:nrow(input.phi), function(idx) { :
  all scheduled cores encountered errors in user code
Execution halted

Info :

  1. I used raw counts for the scRNA-seq and bulk RNA-seq data without any normalizations or transformations.
  2. I have a total of 18 cell types. My gene x cell matrix is about ~10k x 2500 in size.
  3. I did not provide any tumor reference profile in this case ( or anything thats similar )

Could you please let me know if I can use this tool for other kinds of data , and if yes, how can I make this work.

Thanks , RK

tinyi commented 3 years ago

Hi RK,

Please double check your cell.type.labels. I think you might have NA values in it.

Best,

Tinyi

tinyi commented 3 years ago

BayesPrism has been upgraded to v1.2 with a new built-in functions to remove ribosomal / mitochondrial and genes on sex chromosomes. See the updated vignette and help function for more details. Feel free to check it out.

RK1912 commented 3 years ago

Hi Tinyi ! Thanks for your quick response. I checked my labels and everything seems to be there. But maybe there has been a misunderstanding -- My current data is like so for run.TED(): ref.dat is a 2627 x 10652 data frame where row names are unique cell IDs and column names are gene names. X is the bulk data ( 50 x 10652 data frame) where rows are names of the bulk samples and column names are gene names same as ref.dat cell.type.labels = are cell type names from the metadata file, and each cell type name corresponds to the cell id in ref.dat .

For example : If the row names ( unique cell id ) are "cell_id1", "cell_id2", "cell_id3", "cell_id4" and if the first 2 belong to the cell types "cell_type1" and the last 2 belong to "cell_type2", then the cell.type.labels = "cell_type1", "cell_type1", "cell_type2", "cell_type2".

I am not sure if I misunderstood the cell.type.names or if I should change the row names to cell type names instead of unique ids.

Please let me know if this is the right implementation.

Thanks , RK

tinyi commented 3 years ago

Hi RK,

Row names of ref.dat can be unique cell barcodes. Could you do table(is.na(cell.type.labels))? Also try converting data.frame to matrix, and see if it works.

Best,

Tinyi

RK1912 commented 3 years ago

Hi, I tried both the ways and I still get the same error. I have the data here: https://github.com/RK1912/Deconv_data

Could you please help me figure this out ?

Thanks , RK

RK1912 commented 3 years ago

Also, another question I have is : Can run.TED() continue to process the bulk samples, even if one sample being processed in a core failed? In this case we can get results for other samples even if one fails.

Thanks, RK

tinyi commented 3 years ago

I have tried your data. I did not see any problem in running. Here is the code:

library(TED) X <- readRDS("X.rds") ref.dat <- readRDS("sc.rds") cell_types <- readRDS("cell_types.rds")

tcga.ted <- run.Ted (ref.dat = t(ref.dat), X=t(X),cell.type.labels=cell_types,input.type="scRNA",n.cores=10)

console output as follows:

[1] "removing non-numeric genes..." [1] "removing outlier genes..." Number of outlier genes filtered= 3 [1] "aligning reference and mixture..." [1] "No tumor reference is speficied. Reference profiles are treated equally." [1] "run first sampling" current sample ID:1 2 3 4 5 6 7 8 9 10 [1] "merge subtypes" SC_T1 SC_T4 SC_T3 SC_T6 SC_M3 SC_M2 SC_M1 SC_F4 SC_F3 SC_F2 SC_F1 SC_T2 Min. 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1st Qu. 0.054 0.001 0.021 0.000 0.004 0.010 0.007 0.002 0.001 0.001 0.104 0.000 Median 0.097 0.008 0.039 0.000 0.012 0.072 0.048 0.033 0.055 0.009 0.126 0.001 Mean 0.126 0.032 0.041 0.019 0.041 0.084 0.058 0.069 0.094 0.037 0.125 0.002 3rd Qu. 0.187 0.039 0.061 0.032 0.049 0.111 0.073 0.136 0.171 0.047 0.164 0.002 Max. 0.341 0.189 0.085 0.097 0.199 0.254 0.188 0.209 0.298 0.166 0.286 0.016 SC_T5 SC_B4 SC_B2 SC_B1 SC_B3 SC_M4 Min. 0.000 0.001 0.000 0.000 0.001 0.000 1st Qu. 0.009 0.059 0.004 0.001 0.010 0.002 Median 0.019 0.112 0.018 0.009 0.013 0.009 Mean 0.020 0.131 0.031 0.031 0.036 0.025 3rd Qu. 0.023 0.202 0.027 0.051 0.025 0.043 Max. 0.062 0.351 0.109 0.114 0.150 0.080 [1] "pooling information across samples" [1] "run final sampling" current sample ID:1 2 3 4 5 6 7 8 9 10 SC_T1 SC_T4 SC_T3 SC_T6 SC_M3 SC_M2 SC_M1 SC_F4 SC_F3 SC_F2 SC_F1 SC_T2 Min. 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1st Qu. 0.006 0.000 0.001 0.000 0.000 0.001 0.001 0.000 0.000 0.000 0.002 0.000 Median 0.044 0.000 0.024 0.001 0.000 0.037 0.012 0.001 0.009 0.000 0.082 0.000 Mean 0.108 0.066 0.039 0.034 0.043 0.061 0.059 0.058 0.105 0.049 0.108 0.016 3rd Qu. 0.188 0.002 0.055 0.038 0.008 0.048 0.048 0.102 0.192 0.027 0.181 0.002 Max. 0.439 0.600 0.115 0.212 0.317 0.363 0.267 0.221 0.429 0.378 0.330 0.151 SC_T5 SC_B4 SC_B2 SC_B1 SC_B3 SC_M4 Min. 0.000 0.000 0.000 0.000 0.000 0.000 1st Qu. 0.000 0.005 0.000 0.000 0.000 0.000 Median 0.011 0.055 0.004 0.001 0.001 0.007 Mean 0.025 0.096 0.058 0.019 0.033 0.024 3rd Qu. 0.026 0.162 0.008 0.018 0.003 0.033

let me know if you cannot reproduce it

RK1912 commented 3 years ago

Hi Tinyi , Thanks for your help. I was able to run it previously when I replaced the unique column names in ref.dat with the cell type labels. But I realized I had input.type = "GEP" so maybe that made the difference. I now ran it with your code and I dont see any problems .

Thanks !