NabaviLab / SigEMD

R package for differential gene expression analysis in single-cell RNAseq
7 stars 2 forks source link

FunImpute and calculate_single error #4

Closed NTNguyen13 closed 5 years ago

NTNguyen13 commented 5 years ago

Hi, I've recently used SigEMD for DEG analysis. My data:

> typeof(data)  # Large dgCMatrix
[1] "S4"
> dim(data)
[1] 17123   183
> head(data)
6 x 183 sparse Matrix of class "dgCMatrix"
   [[ suppressing 60 column names ‘AAACCCATCGAGTACT_1’, ‘AACAAAGAGCTGGCTC_1’, ‘AACCATGTCCGATGCG_1’ ... ]]

RP11-34P13.7  . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AL627309.1    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AP006222.2    . . . . . . . . . . . . . 1 . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RP4-669L17.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RP11-206L10.3 . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . .
RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

> length(condition)
[1] 183

When performing imputation, I got the following errors: Error in unique.default(lasso_input) : unique() applies only to vectors

I also tried calculate_single without Hur_gene by using: results <- calculate_single(data = data,condition = condition,Hur_gene = NULL, binSize=0.2, nperm=5)

but I got this error:

do EMD for Hur genes.
Calculating pairwise emd scores...done.
Calculating emd...done.
Calculating permuted emd #1 of 5...done.
Calculating permuted emd #2 of 5...done.
Calculating permuted emd #3 of 5...done.
Calculating permuted emd #4 of 5...done.
Calculating permuted emd #5 of 5...done.
Calculating q-values...done.
do EMD for nonHur genes.
calculate emd for each gene...
 Error in hist.default(dataA, breaks = bins, plot = FALSE) : 
  invalid number of 'breaks' 

Could you please help me on this problem? Thank you very much

tianyu-github commented 5 years ago

Hi, I've recently used SigEMD for DEG analysis. My data:

> typeof(data)  # Large dgCMatrix
[1] "S4"
> dim(data)
[1] 17123   183
> head(data)
6 x 183 sparse Matrix of class "dgCMatrix"
   [[ suppressing 60 column names ‘AAACCCATCGAGTACT_1’, ‘AACAAAGAGCTGGCTC_1’, ‘AACCATGTCCGATGCG_1’ ... ]]

RP11-34P13.7  . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AL627309.1    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
AP006222.2    . . . . . . . . . . . . . 1 . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RP4-669L17.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RP11-206L10.3 . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . .
RP11-206L10.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

> length(condition)
[1] 183

When performing imputation, I got the following errors: Error in unique.default(lasso_input) : unique() applies only to vectors

I also tried calculate_single without Hur_gene by using: results <- calculate_single(data = data,condition = condition,Hur_gene = NULL, binSize=0.2, nperm=5)

but I got this error:

do EMD for Hur genes.
Calculating pairwise emd scores...done.
Calculating emd...done.
Calculating permuted emd #1 of 5...done.
Calculating permuted emd #2 of 5...done.
Calculating permuted emd #3 of 5...done.
Calculating permuted emd #4 of 5...done.
Calculating permuted emd #5 of 5...done.
Calculating q-values...done.
do EMD for nonHur genes.
calculate emd for each gene...
 Error in hist.default(dataA, breaks = bins, plot = FALSE) : 
  invalid number of 'breaks' 

Could you please help me on this problem? Thank you very much

Hi, Did you preprocess the data, at least delete the genes which expressed zero counts among all the cells? The first error occurred because the variable "lasso_input" in the function is not a vector, it is likely to be NULL in a case if the data has one row of all zeros. The second is also likely due to a row (gene) of all expressed zeros. I suggest you normalize the data at first, and use function "data <- dataclean (data)" to delete the rows with all zeros. Let me know it works or not.

NTNguyen13 commented 5 years ago

Thank you for your advice, I have done the following:

> data.filter0 <- data[!colSums(data)==0,]
> dim(data.filter0)
[1] 17123   183
> dim(data)
[1] 17123   183

I take the colSum and then check if they are equal to zero to filter them, but I found that none of them are equal to 0. The colSums(data)==0 resulted in all FALSE value.

Then I used data <- dataclean (data) and I saw that over 5k rows have been filtered out! I will check the result and let you know if there's any problem

NTNguyen13 commented 5 years ago

Unfortunately, after data cleaning, I'm still having the same problem in FunImpute:

> dim(data)
[1] 17123   183
> data <- dataclean(data)
Remove genes that all are zeros...
done
> databinary<- databin(data)
> dim(data)
[1] 12146   183
> Hur_gene<- idfyImpgene(data, databinary, condition)
> genes_use<- idfyUsegene(data, databinary, condition,ratio = 0.5) 
> 
> datimp <- FunImpute(object = data, genes_use = (genes_use), genes_fit = (Hur_gene),dcorgene = NULL) 
 Error in unique.default(lasso_input) : unique() applies only to vectors

But I'm able to run calculate_single using hur_Gene = NULL