Danko-Lab / TED

a fully Bayesian approach to deconvolve tumor microenvironment
60 stars 10 forks source link

Error running TED: Error: $ operator is invalid for atomic vectors #16

Open zroger49 opened 2 years ago

zroger49 commented 2 years ago

Hello. Previously, a member from my lab was able to run your software (this was about 5-6 months ago). I tried to reproduce their results using a different reference dataset.

However, when I try to run my analysis, the following error occurs:

Error: $ operator is invalid for atomic vectors
In addition: Warning message:
In mclapply(1:N, FUN = function(i) { :
  all scheduled cores encountered errors in user code 

And when I run the same analysis using a single core

current sample ID:1  Error in rmultinom(n = 1, size = X.i[g], prob = prob.mat[, g]) :
  invalid second argument 'size'

My bulk data is the TPM residuals (after regressing out the effect of multiple covariates using a multiple linear model in the original TPM expression table), while my scRNA-seq data was normalized using NormalizeData(normalization.method = "RC", scale.factor = 100000, margin = 1) and subseted for the top 5000 variable genes.

I know this method works best if counts are used as input, but this setup has previously worked and we had decent results

tinyi commented 2 years ago

Hi Rogério,

Thank you for your interest in our work. It is a bit difficult to tell where exactly it went wrong. My suggestions are as follows.

1) make sure the input is a matrix rather than a dataframe 2) make sure all input including the cell.type.labels and their colnames/rownames do not have NA values. 3) make sure there is no negative value in the input (as you mentioned you used regressed out some covariates).

If the issue still exists, you may send me your data (or subset of your data) in the rdata file.

Best,

Tinyi

On Tue, Mar 29, 2022 at 6:25 AM Rogério Ribeiro @.***> wrote:

Hello. Previously, a member from my lab was able to run your software (this was about 5-6 months ago). I tried to reproduce their results using a different reference dataset.

However, when I try to run my analysis, the following error occurs:

Error: $ operator is invalid for atomic vectors In addition: Warning message: In mclapply(1:N, FUN = function(i) { : all scheduled cores encountered errors in user code

My bulk data is in TPM (after regressin the effect of multiple covariates using a multiple linear model), while my scRNA-seq data was normalized using NormalizeData(normalization.method = "RC", scale.factor = 100000, margin = 1) and subseted for the top 5000 variable genes.

I know this method works best if counts are used as input, but this setup has previously worked and we had decent results

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/TED/issues/16, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSYDO3BKWH36MKNUPPDVCLLCFANCNFSM5R56F2JQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zroger49 commented 2 years ago

Hello again, The issue seems to be the related to the negative numbers in the expression matrix! I tried to run with the raw counts matrix, but I had the following issue:

...
[1] "pooling information across samples"
Killed
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
  ignoring SIGPIPE signal
Calls: run.Ted ... optimize.psi -> mclapply -> lapply -> FUN -> sendMaster

Could this be related to the memory I have available in my machine?

tinyi commented 2 years ago

Yes. I think it is related to memory. You may try setting the n.cores.2g argument with a smaller value. Also removal of unused variables from the workspace followed by cleaning up the memory using gc() may help.

There are a few possibilities of fixing the negative values. The easiest way would be to exclude genes with negative values. I am not sure what the setup of your regression is. Using the residuals may result in lots of zeros. If that is the case, you may try adding back the intercept term. Also you can change the reference level in the regression to see if which direction may yield fewer negative values. Also some people may regress using the log transformed values. In this case, one will need to transform it back to the original raw scale by exponentiating the values.

On Tue, Apr 5, 2022 at 3:44 AM Rogério Ribeiro @.***> wrote:

Hello again, The issue seems to be the related to the negative numbers in the expression matrix! I tried to run with the raw counts matrix, but I had the following issue:

... [1] "pooling information across samples" Killed Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal Calls: run.Ted ... optimize.psi -> mclapply -> lapply -> FUN -> sendMaster

Could this be related to the memory I have available in my machine?

Is this related

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/TED/issues/16#issuecomment-1088371552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSZ4N5YGIGLOTTQTPCLVDPVO3ANCNFSM5R56F2JQ . You are receiving this because you commented.Message ID: @.***>

zroger49 commented 2 years ago

Yes. I think it is related to memory. You may try setting the n.cores.2g argument with a smaller value. Also removal of unused variables from the workspace followed by cleaning up the memory using gc() may help. There are a few possibilities of fixing the negative values. The easiest way would be to exclude genes with negative values. I am not sure what the setup of your regression is. Using the residuals may result in lots of zeros. If that is the case, you may try adding back the intercept term. Also you can change the reference level in the regression to see if which direction may yield fewer negative values. Also some people may regress using the log transformed values. In this case, one will need to transform it back to the original raw scale by exponentiating the values. On Tue, Apr 5, 2022 at 3:44 AM Rogério Ribeiro @.> wrote: Hello again, The issue seems to be the related to the negative numbers in the expression matrix! I tried to run with the raw counts matrix, but I had the following issue: ... [1] "pooling information across samples" Killed Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal Calls: run.Ted ... optimize.psi -> mclapply -> lapply -> FUN -> sendMaster Could this be related to the memory I have available in my machine? Is this related — Reply to this email directly, view it on GitHub <#16 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSZ4N5YGIGLOTTQTPCLVDPVO3ANCNFSM5R56F2JQ . You are receiving this because you commented.Message ID: @.>

Regarding the negative counts in the input matrix, I decided to run the analysis using the raw counts and then take into account the covariates in downstream analysis. The results look good so far.

Also, when I tried to run the same type of analysis, but using another dataset (this time TCGA), I got the following error

[1] "pooling information across samples"
Error in log.fold[i, ] : subscript out of bounds
In addition: Warning message:
In mclapply(1:nrow(input.phi), function(idx) { :
  scheduled cores 3, 8 did not deliver results, all values of the jobs will be affected

It seems that the first round of the analysis was completed tho. Is it safe to carry on with these results?

tinyi commented 2 years ago

This also seems to be a memory issue. We are working on improving the memory efficiency and user interface, but will take some time, probably ~2 weeks. Would you mind trying our web portal for the time being?

Yes. You may use the first round results, they should be unaffected.

Best,

Tinyi

On Thu, Apr 7, 2022 at 1:15 PM Rogério Ribeiro @.***> wrote:

Yes. I think it is related to memory. You may try setting the n.cores.2g argument with a smaller value. Also removal of unused variables from the workspace followed by cleaning up the memory using gc() may help. There are a few possibilities of fixing the negative values. The easiest way would be to exclude genes with negative values. I am not sure what the setup of your regression is. Using the residuals may result in lots of zeros. If that is the case, you may try adding back the intercept term. Also you can change the reference level in the regression to see if which direction may yield fewer negative values. Also some people may regress using the log transformed values. In this case, one will need to transform it back to the original raw scale by exponentiating the values. … <#m1377458393885483722> On Tue, Apr 5, 2022 at 3:44 AM Rogério Ribeiro @.> wrote: Hello again, The issue seems to be the related to the negative numbers in the expression matrix! I tried to run with the raw counts matrix, but I had the following issue: ... [1] "pooling information across samples" Killed Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) : ignoring SIGPIPE signal Calls: run.Ted ... optimize.psi -> mclapply -> lapply -> FUN -> sendMaster Could this be related to the memory I have available in my machine? Is this related — Reply to this email directly, view it on GitHub <#16 (comment) https://github.com/Danko-Lab/TED/issues/16#issuecomment-1088371552>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSZ4N5YGIGLOTTQTPCLVDPVO3ANCNFSM5R56F2JQ https://github.com/notifications/unsubscribe-auth/AB4NHSZ4N5YGIGLOTTQTPCLVDPVO3ANCNFSM5R56F2JQ . You are receiving this because you commented.Message ID: @.> Regarding the negative counts in the input matrix, I decided to run the analysis using the raw counts and then take into account the covariates in downstream analysis. The results look good so far.

Also, when I tried to run the same type of analysis, but using another dataset (this time TCGA), I got the following error

[1] "pooling information across samples"

Error in log.fold[i, ] : subscript out of bounds

In addition: Warning message:

In mclapply(1:nrow(input.phi), function(idx) { :

scheduled cores 3, 8 did not deliver results, all values of the jobs will be affected

It seems that the first round of the analysis was completed tho. Is it safe to carry on with these results?

— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/TED/issues/16#issuecomment-1091997928, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHS5X3HRJ3RNDUDEW5Q3VD4J2DANCNFSM5R56F2JQ . You are receiving this because you commented.Message ID: @.***>

tinyi commented 2 years ago

Hi Rogério,

I have updated the current git repository to v1.4. This version has addressed the memory issue. You may try this and let me know if there it helps.

Best,

Tinyi