IMB-Computational-Genomics-Lab / ascend

R package - Analysis of Single Cell Expression, Normalisation and Differential expression (ascend)
22 stars 7 forks source link

Error: BiocParallel errors #14

Closed MichaelPeibo closed 6 years ago

MichaelPeibo commented 6 years ago

Hi, Ascend team after normalization by scranNormalise, I want to regress out the cell cycle factor by RegressConfoundingFactors, however, when I run this function , I got this error,

Error: BiocParallel errors
  element index: 1, 2, 3, 4, 5, 6, ...
  first error: NA/NaN/Inf in 'x'

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

Matrix products: default
BLAS: /share/app/cluster/R-3.4.3/lib64/R/lib/libRblas.so
LAPACK: /share/app/cluster/R-3.4.3/lib64/R/lib/libRlapack.so

 version
               _
platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          3
minor          4.3
year           2017
month          11
day            30
svn rev        73796
language       R
version.string R version 3.4.3 (2017-11-30)
nickname       Kite-Eating Tree

I did installed and configured the BiocParallel as you told, any suggestion on this? Thanks!

asenabouth commented 6 years ago

Hi @MichaelPeibo, Could you please email me your script and EMSet (saved as an RDS) to me at a.senabouth @ imb.uq.edu.au so I can investigate this issue for you? And also, are you working on a high-performance computing environment (ie. PBSPro, SLURM, LSF etc...)

MichaelPeibo commented 6 years ago

Hi @asenabouth , I sent to you my em.set and script(which could be a little messy, but I believe you can find the key code); here is my running info, I am working on a Linux platform, I don't know if these info could be helpful for you cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c 8 Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz

uname -a Linux mgt 2.6.32-696.13.2.el6.x86_64 #1 SMP Thu Oct 5 21:22:16 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Thanks for your attention and help.

asenabouth commented 6 years ago

Hi @MichaelPeibo - I've found the cause of your error, and it may have downstream effects. Turns out the scranNormalise function has converted some of the values into the expression matrix into infinite values - this is definitely not ideal. I will look into this; in the meantime, I recommend you use the other normalisation method NormaliseByRLE. Thank you for your patience.

asenabouth commented 6 years ago

scranNormalise has been fixed, and RegressConfoundingFactors function now works on your dataset @MichaelPeibo . Thank you for raising this issue. Please let me know if you have any other issues.

MichaelPeibo commented 6 years ago

@asenabouth Really appreciate your help. I re-installed the ascend package bu the install_github, and passed the regressConfoundingFactor step; however, when I did PCA, I get this strange plot image And when I run the RunCORE, I got this error `

clustered.set <- RunCORE(pca.set, conservative = TRUE) [1] "Performing unsupervised clustering..." Error in RunCORE(pca.set, conservative = TRUE) : Your dataset may contain cells that are too distinct from the main population of cells. We recommend you run this function with 'remove_outlier = TRUE' or check the cell-cell normalisation of your dataset. ` I did filter by default, any suggestion?

asenabouth commented 6 years ago

That's a strange result. Your dataset is large enough to have enough variance (unless the regression removed this). We don't usually regress confounding factors on our dataset (the option is there for those that do wish to do this step). Do you get the same result on the dataset if you don't use the confounding factor regression?

You can also use the 'remove_outlier' option with RunCORE to see what you get. This step will remove these outliers however, and is more time consuming as it repeats the dynamic tree cut until all remaining cells can be assigned a cluster.

MichaelPeibo commented 6 years ago

I skipped the regression confounding factor step, and set remove_outlier = TRUE; PC variance looks 'not that strange' image

but when I tried to use the most stable and the least stable RunCORE methods, I got the unchanged results.

besides, does Ascend has any options which can be used to cluster 'once for all' or tune the 'cluster resolution' like Seurat did, rather than, repeated clustering?

asenabouth commented 6 years ago

I had a look at your data to see if I can shed any more light on the issue - if you generate a PCA plot with PlotPCA you will see some the majority of the points in one location and some distinct data points separated away from this location. These would be the outliers in your dataset.

I also ran RunCORE with remove_outlier set to TRUE, which discarded (but kept a record of) these cells which generated a result of three clusters. The number of outlier cells was less than 20, which is the minimum cluster size set by dynamicTreeCut.

The way RunCORE works is it performs clustering at different resolutions and then selects the most stable resolution for you. Once you run the RunCORE function, you can view the results of all the resolutions by using the GetRandMatrix function and PlotStabilityDendro function, so you can decide if that was the best resolution for you.

We also introduced an option in the latest update to set the size of these sliding windows by using the "windows" argument (just input a sequence of numbers ranging from 0 to 1). It will still try 40 different resolutions however.

Hope that helps. Our group is working on a more detailed clustering package for single cell data, but we don't have an ETA for that yet.

MichaelPeibo commented 6 years ago

Hi @asenabouth Do not know if there is any update like I mentioned above 'clustering once for all' (with doc.)?

Another point confused me is what you mentioned in your tutorial and your paper(congrats!), you think there are some apoptosis pathway related genes enrich in cluster2, how do you define it ? Is there any way to determine it automatically?

Thanks!

MichaelPeibo commented 6 years ago

P.S. Such as pathway analysis just following your processing with Asend;

Also, I really like your devolcalno plot, shown here: image in these cases, you only show the label of some genes rather than all. How do you plot it?(I did not find tunable parameters in the plot function)

And what is the parameter setting for certain gene expression plot in tsne?(sorry for thousands of Qs...)

asenabouth commented 6 years ago

Hi @MichaelPeibo - thanks for your questions! It gives us a good idea of how our users are using our package. I'm moving your comments to different threads, just so it will be easier to track and if any other users have similar questions, they can refer to your threads.