Closed naddsch closed 5 years ago
Hi Laura!
Nice investigative work, that line was indeed the problem. I now changed this inside the wrapper, and added an ndim
parameter if you would want to change the number of components.
This is now building on travis, I'll post an update once that's finished.
Slingshot will probably take awhile to run on this dataset (estimated 2 hours). Have you also considered other methods, such as PAGA?
Best Wouter
(slingshot has recently made some speed improvements, see kstreet13/slingshot#31, it might be that the new build is a lot more scalable)
Hi Wouter,
thanks for the quick help and reply!
I think in your commit "add ndim parameter, fix for dynverse/dyno#55" you might have forgotton to replace n=20
by n=ndim
in the call to
pca <- irlba::prcomp_irlba(expression, n = 20)
Actually I also tried to use PAGA and PAGA tree, but here I get another error. Still trying to figure it out. I'll open another issue for that method :)
Best, Laura
Hi Laura
Thanks :blush:
For PAGA, you're error is probably related within the feature filtering that is done in the beginning. It's still an enigma to me why it errors internally. Feel free to make a new issue!
This should be fixed with ti_slingshot 1.0.2 and onwards. You can run this now using :
infer_trajectory(dataset, "dynverse/ti_slingshot:v1.0.2")
Will be included in dynmethods soon-ish once travis stops complaining :crossed_fingers:
Thank you so much! He's skipping dimensionality reduction now. :blush:
Unfortunately I run into another error afterwards:
Error: cannot allocate vector of size 3.1 Gb
Execution halted
I already increased memory.limit() to 16GB as this is what's installed on my machine, but the error keeps popping up. I'm running 64bit R on 64 bit Windows 7. Do you know how to eliminate this behaviour?
And there is not much memory used when running infer_trajectory
:
> memory.size()
[1] 687.63
Best, Laura
I did a quick investigation and I think this is caused by the pam clustering, which doesn't really scale well with increasing number of samples. I added a parameter cluster_method
where you can change the clustering method to clara
, which is closely related to pam but much more scalable.
You might have to lower the ndim argument as well, because otherwise you might get some convergence errors (the principal curves algorithm seems to be very sensitive to this).
travis is building the container at the moment (https://travis-ci.org/dynverse/ti_slingshot/builds/551362660), if all goes well you should be able to do
infer_trajectory(dataset, "dynverse/ti_slingshot:v1.0.3", ndim = 3, cluster_method = "clara", verbose = TRUE)
in a couple of minutes.
Along with the improvements that @rcannood made, this should make slingshot more scalable, although it still takes some minutes to run on my 30k cells and 10 features examples.
Perfect, this is working like a charm! Thanks a lot, I will now try to figure out how to proceed in the analysis pipeline :smiley:
Have a nice weekend!
Hi there, I am just trying to use your package on flow cytometry data. I read in the User Guide that "Currently, alternative input data such as ATAC-Seq or cytometry data are not yet supported, although it is possible to simply include this data as expression and counts."
I am trying an examplary dataset of ~30.000 cells and 5 dimensions with a pre-set start_id, because I know how the cells develop concerning these 5 dimensions. I wrapped this data into counts and expression of a dataset.
Unfortunately when running
I get the following error
I think that the following line in ti_slingshot causes the problem since n=20 is fixed here and my dataset does have less than 20 dimensions :
Or am I missing something?
Thanks in advance, Laura