Closed Vivianstats closed 6 years ago
HI @Vivianstats
Thanks for giving Splatter a go, hopefully we can get this issue sorted out for you.
Thanks for providing your dataset, that is really helpful. It looks like the library sizes here aren't log-normal distributed, which would explain why the fitting is failing. You can check this by running:
lib.sizes <- colSums(counts)
plot(density(lib.sizes))
Can you tell me anything more about the dataset (capture protocol, processing etc.)? This might help us work out if it is something specific to your data or if I need to adjust Splatter to account for this.
Thanks for your quick reply!
The data was based on Smart-seq2. Hope this information helps.
As a follow-up question, what type of data (protocols) are better simulated by the current splat method ?
Splatter was mainly developed using UMI datasets so we have a bit of a bias towards those.
I think I can probably adjust the Splat model so that it can choose between log-normal and normal distributions for the library size, I think in your case normal would be a better fit. I am on leave the next week so I'm not sure when I will have time to implement it but should be before the next Bioconductor release in May.
In the meantime have you tried any other the other simulations in Splatter? One of the others might work a bit better for your dataset if it has the features you need.
Sorry I can't be more helpful right now but hopefully I'll have time to look into it a bit more soon.
Hello,
Thanks for your reply. I'm wondering among the other simulation methods in Splatter, which can be completed by just supplying a real dataset for estimation of parameters, and no parameters need to be manually set?
I checked scDDEstimate, but it looks like it needs gene numbers for all types of genes such as DE, DP, etc. This information is not easy to obtain for a real dataset.
I would like to do something like the one in your paper: based on a real dataset, I simulate data using multiple methods with all parameters estimated from the real data. How can I do that? I may have missed it, but do you have example codes demonstrating the simulations in your paper?
Thanks!
All of the simulations in Splatter let you estimate parameters from a real dataset (except for the Lun simulation). Some of them require a bit of extra information about the dataset like groups or batches that are present. The code used in the paper is here https://github.com/Oshlack/splatter-paper but it was done using an older version of Splatter so some of it will be out of date. You could have a look at the help pages for the various estimate functions to see what they require (such as ?lun2Estimate
). Try running listSims()
to see a bit about all the different simulations, there are a few more since the paper.
Hi @Vivianstats. I've made a new branch here https://github.com/Oshlack/splatter/tree/libsizes which allows you to use a normal distribution instead of a log-normal for library sizes (and estimates the parameters). I've done a (very quick) test using your data and it seems to work ok.
It will probably make it into the next Bioconductor release but if you want to try it out sooner you can install this version using:
devtools::install_github("Oshlack/splatter@libsizes")
Thank you very much for spending time to add the features!
I am now able to get results from splatEstimate()
and they look pretty good.
However, I was using scDDEstimate()
last week and was able to obtain results without any issues. After I reinstall the package, the following code would lead to errors (realcount
is a matrix of counts):
params = splatter::scDDEstimate(realcount, conditions = sample(1:2, ncol(realcount), replace = TRUE))
Error in validObject(.Object) :
invalid class “SCDDParams” object: SCDat: Must have class 'SingleCellExperiment', but has class 'SummarizedExperiment'
I don't have any experience with 'SingleCellExperiment', and I thought my codes should invoke scDDEstimate.matrix
, not scDDEstimate.SingleCellExperiment
. Could you please give me some help on this issue?
Thanks a lot!
That looks like a problem with the Splatter version and maybe some other packages. Can you run the code that caused the error again and then paste the output of sessionInfo()
here?
Do you think the changes I made are useful? If so I will look at merging them into the development version.
Yes, I do think this version is very helpful!
I have attached the output of session_info()
here:
> session_info()
Session info ----------------------------------------------------------------------------------
setting value
version R version 3.4.2 (2017-09-28)
system x86_64, linux-gnu
ui RStudio (1.1.442)
language en_US
collate en_US.UTF-8
tz America/Los_Angeles
date 2018-04-16
Packages --------------------------------------------------------------------------------------
package * version date source
abind 1.4-5 2016-07-21 CRAN (R 3.4.2)
AnnotationDbi 1.38.2 2017-10-23 Bioconductor
arm 1.10-1 2018-04-13 CRAN (R 3.4.2)
assertthat 0.2.0 2017-04-11 CRAN (R 3.4.2)
backports 1.1.2 2017-12-13 CRAN (R 3.4.2)
base * 3.4.2 2017-09-29 local
beeswarm 0.2.3 2016-04-25 CRAN (R 3.4.2)
bindr 0.1.1 2018-03-13 CRAN (R 3.4.2)
bindrcpp 0.2.2 2018-03-29 CRAN (R 3.4.2)
Biobase * 2.36.2 2017-10-10 Bioconductor
BiocGenerics * 0.22.1 2017-10-10 Bioconductor
BiocParallel 1.10.1 2017-10-10 Bioconductor
biomaRt 2.32.1 2017-10-23 Bioconductor
bit 1.1-12 2014-04-09 CRAN (R 3.4.2)
bit64 0.9-7 2017-05-08 CRAN (R 3.4.2)
bitops 1.0-6 2013-08-17 CRAN (R 3.4.2)
blob 1.1.1 2018-03-25 CRAN (R 3.4.2)
blockmodeling 0.3.0 2018-04-11 CRAN (R 3.4.2)
caTools 1.17.1 2014-09-10 CRAN (R 3.4.2)
checkmate 1.8.5 2017-10-24 CRAN (R 3.4.2)
coda 0.19-1 2016-12-08 CRAN (R 3.4.2)
codetools 0.2-15 2016-10-05 CRAN (R 3.3.1)
colorspace 1.3-2 2016-12-14 CRAN (R 3.4.2)
compiler 3.4.2 2017-09-29 local
data.table 1.10.4-3 2017-10-27 CRAN (R 3.4.2)
datasets * 3.4.2 2017-09-29 local
DBI 0.8 2018-03-02 CRAN (R 3.4.2)
DelayedArray * 0.2.7 2017-10-10 Bioconductor
devtools * 1.13.5 2018-02-18 CRAN (R 3.4.2)
digest 0.6.15 2018-01-28 CRAN (R 3.4.2)
doParallel 1.0.11 2017-09-28 CRAN (R 3.4.2)
doRNG 1.6.6 2017-04-10 CRAN (R 3.4.2)
dotCall64 0.9-5.2 2018-01-11 CRAN (R 3.4.2)
dplyr 0.7.4 2017-09-28 CRAN (R 3.4.2)
DT 0.4 2018-01-30 CRAN (R 3.4.2)
dynamicTreeCut 1.63-1 2016-03-11 CRAN (R 3.4.2)
EBSeq 1.16.0 2018-04-03 Bioconductor
edgeR 3.20.9 2018-04-15 Bioconductor
fields 9.6 2018-01-29 CRAN (R 3.4.2)
FNN 1.1 2013-07-31 CRAN (R 3.4.2)
foreach 1.4.4 2017-12-12 CRAN (R 3.4.2)
gdata 2.18.0 2017-06-06 CRAN (R 3.4.2)
GenomeInfoDb * 1.12.3 2017-10-10 Bioconductor
GenomeInfoDbData 0.99.0 2017-10-10 Bioconductor
GenomicRanges * 1.28.6 2017-10-10 Bioconductor
ggbeeswarm 0.6.0 2017-08-07 CRAN (R 3.4.2)
ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.4.2)
ggsci * 2.8 2017-09-30 CRAN (R 3.4.2)
glue 1.2.0 2017-10-29 cran (@1.2.0)
gplots 3.0.1 2016-03-30 CRAN (R 3.4.2)
graphics * 3.4.2 2017-09-29 local
grDevices * 3.4.2 2017-09-29 local
grid 3.4.2 2017-09-29 local
gridExtra 2.3 2017-09-09 CRAN (R 3.4.2)
gtable 0.2.0 2016-02-26 CRAN (R 3.4.2)
gtools 3.5.0 2015-05-29 CRAN (R 3.4.2)
htmltools 0.3.6 2017-04-28 CRAN (R 3.4.2)
htmlwidgets 1.0 2018-01-20 CRAN (R 3.4.2)
httpuv 1.3.6.2 2018-03-02 CRAN (R 3.4.2)
igraph 1.2.1 2018-03-10 cran (@1.2.1)
IRanges * 2.10.5 2017-10-10 Bioconductor
iterators 1.0.9 2017-12-12 CRAN (R 3.4.2)
KernSmooth 2.23-15 2015-06-29 CRAN (R 3.4.0)
lattice 0.20-35 2017-03-25 CRAN (R 3.3.3)
lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2)
limma 3.34.9 2018-04-15 Bioconductor
lme4 1.1-17 2018-04-03 CRAN (R 3.4.2)
locfit 1.5-9.1 2013-04-20 CRAN (R 3.4.2)
magrittr 1.5 2014-11-22 CRAN (R 3.4.2)
maps 3.3.0 2018-04-03 CRAN (R 3.4.2)
MASS * 7.3-49 2018-02-23 CRAN (R 3.4.2)
Matrix 1.2-14 2018-04-09 CRAN (R 3.4.2)
matrixStats * 0.53.1 2018-02-11 CRAN (R 3.4.2)
mclust 5.4 2017-11-22 CRAN (R 3.4.2)
memoise 1.1.0 2017-04-21 CRAN (R 3.4.2)
methods * 3.4.2 2017-09-29 local
mime 0.5 2016-07-07 CRAN (R 3.4.2)
minqa 1.2.4 2014-10-09 CRAN (R 3.4.2)
munsell 0.4.3 2016-02-13 CRAN (R 3.4.2)
nlme 3.1-131 2017-02-06 CRAN (R 3.4.0)
nloptr 1.0.4 2017-08-22 CRAN (R 3.4.2)
outliers 0.14 2011-01-24 CRAN (R 3.4.2)
parallel * 3.4.2 2017-09-29 local
pillar 1.2.1 2018-02-27 CRAN (R 3.4.2)
pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.2)
pkgmaker 0.22 2014-05-14 CRAN (R 3.4.2)
plyr 1.8.4 2016-06-08 CRAN (R 3.4.2)
R6 2.2.2 2017-06-17 CRAN (R 3.4.2)
Rcpp 0.12.16 2018-03-13 CRAN (R 3.4.2)
RCurl 1.95-4.10 2018-01-04 CRAN (R 3.4.2)
registry 0.5 2017-12-03 CRAN (R 3.4.2)
reshape2 1.4.3 2017-12-11 CRAN (R 3.4.2)
rhdf5 2.20.0 2018-03-24 Bioconductor
rjson 0.2.15 2014-11-03 CRAN (R 3.4.2)
rlang 0.2.0 2018-02-20 CRAN (R 3.4.2)
rngtools 1.2.4 2014-03-06 CRAN (R 3.4.2)
RSQLite 2.1.0 2018-03-29 CRAN (R 3.4.2)
rstudioapi 0.7 2017-09-07 CRAN (R 3.4.2)
S4Vectors * 0.14.7 2017-10-10 Bioconductor
scales 0.5.0 2017-08-24 CRAN (R 3.4.2)
scater 1.4.0 2018-03-24 Bioconductor
scDD * 1.0.0 2018-04-03 Bioconductor
scran 1.4.5 2018-04-03 Bioconductor
shiny 1.0.5 2017-08-23 CRAN (R 3.4.2)
shinydashboard 0.7.0 2018-03-21 CRAN (R 3.4.2)
SingleCellExperiment * 1.0.0 2018-04-13 Bioconductor
spam 2.1-4 2018-04-12 CRAN (R 3.4.2)
splatter * 1.3.3.9010 2018-04-13 Github (Oshlack/splatter@5380fa9)
splines 3.4.2 2017-09-29 local
statmod 1.4.30 2017-06-18 CRAN (R 3.4.2)
stats * 3.4.2 2017-09-29 local
stats4 * 3.4.2 2017-09-29 local
stringi 1.1.7 2018-03-12 CRAN (R 3.4.2)
stringr 1.3.0 2018-02-19 CRAN (R 3.4.2)
SummarizedExperiment * 1.6.5 2017-10-10 Bioconductor
testthat 2.0.0 2017-12-13 CRAN (R 3.4.2)
tibble 1.4.2 2018-01-22 cran (@1.4.2)
tools 3.4.2 2017-09-29 local
tximport 1.4.0 2018-03-24 Bioconductor
utils * 3.4.2 2017-09-29 local
vipor 0.4.5 2017-03-22 CRAN (R 3.4.2)
viridis 0.5.1 2018-03-29 CRAN (R 3.4.2)
viridisLite 0.3.0 2018-02-01 CRAN (R 3.4.2)
withr 2.1.2 2018-03-15 CRAN (R 3.4.2)
XML 3.98-1.9 2017-06-19 CRAN (R 3.4.0)
xtable 1.8-2 2016-02-05 CRAN (R 3.4.2)
XVector 0.16.0 2017-10-10 Bioconductor
yaml 2.1.18 2018-03-08 CRAN (R 3.4.2)
zlibbioc 1.22.0 2017-10-10 Bioconductor
zoo 1.8-1 2018-01-08 CRAN (R 3.4.2)
Cool, I will look at merging the changes I made into devel then.
It looks like you have an old version of scDD, the current release version is 1.2.0 and the devel version is 1.3.5. I would try updating that to fix the other issue.
If you run biocLite()
without any arguments it should try and update any packages that are out of date.
The development version including this change is now available on Bioconductor https://bioconductor.org/packages/devel/bioc/html/splatter.html. I'm going to close this issue now but if you have any other question/issues feel free to open a new one or comment on the Bioconductor support forum https://support.bioconductor.org/.
Hello,
Thanks for the useful tool.
I would like to simulate a count matrix by estimating all possible parameters from a real dataset. But when I try
splatEstimate(round(realcount))
I got the following error message:
<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data, gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth, lower = lower, upper = upper, ...): non-finite finite-difference value [2]> Error in fitdistrplus::fitdist(lib.sizes, "lnorm") : the function mle failed to estimate the parameters, with the error code 100
I was able to continue by just using the first 13,000 rows, but the library sizes differ a lot from the real data
Can you provide some suggestions on this application? Thanks!
Here is the real datset I used: count.rds.zip