Closed semmrich closed 5 years ago
Hi @semmrich!
There is probably not a whole lot we can do -- some errors are really method-specific and would require the authors of the software to fix. Just to make sure -- methods like ti_paga
, ti_slingshot
, etc work fine?
Robrecht
Hi Robrecht, I understand. Well, I have tried 14 Dynmethods with tree-building Dynguideline by now, can give you a quick recap how powerful Dynverse is:
My dataset is 10X libraries of total ~18k cells x ~10k genes, all I did up to now is using a downsampled test set of 1.6k cells x ~10k genes for performance reasons. I have a complete hematopoietic hierarchy of sorted naked mole rat cells, expecting a multi-furcated tree, and supplied clustering, start group and several end groups.
Results are split into 4 major categories:
1) Uninformative results (low quality clustering, squished cell groups, contracted axis etc) example: slingshot by default params dimred.slingshot.clara.cosine.pdf slingshot merlot URD (takes forever and is a huge effort to plot due to dropouts) Mpath (drops out >50% of input cells) Cellrouter (very nice cyclic clustering but trajectory condensed to one point) SLICER (no trajectory at all?!)
Especially SLICER was disappointing, since I tried the standalone version by the script from the Hemberg lab (), which performs well with a nice and valid pseudotime on my test set but runs out of mem on a Linux Cluster (BlueHive CIRC University of Rochester, 372 nodes, 8,972 CPU cores, 44 TB RAM, 420 TeraFLOPS) where I had 250GB and 12h runtime!
2) Memory issues
Container was killed, possibly because it ran out of memory (error code 137)Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>
RaceID_stemID
Sincell
SLICE
Celltree_maptpx
3) Tool-specific Error codes, probably depending on make-up of user datasets
pcreode
Traceback (most recent call last): File "/code/run.py", line 34, in <module> pca_reduced_data = data_pca.pca_set_components(min(parameters["n_pca_components"],expression.shape[1])) File "/pCreode/pcreode/pcreode.py", line 83, in pca_set_components return( self.pca[:,:n_components]) TypeError: slice indices must be integers or None or have an __index__ method Traceback (most recent call last): File "/code/run.py", line 34, in <module> pca_reduced_data = data_pca.pca_set_components(min(parameters["n_pca_components"],expression.shape[1])) File "/pCreode/pcreode/pcreode.py", line 83, in pca_set_components return( self.pca[:,:n_components]) TypeError: slice indices must be integers or None or have an __index__ method Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>
ElPiGraph
Error in H5File.open(filename, mode, file_create_pl, file_access_pl) : HDF5-API Errors: error #000: ../../../src/H5F.c in H5Fcreate(): line 491: unable to create file class: HDF5 major: File accessibilty minor: Unable to open file error #001: ../../../src/H5Fint.c in H5F_open(): line 1111: unable to open file: time = Thu Aug 8 21:49:57 2019 , name = '/ti/output.h5', tent_flags = 13 class: HDF5 major: File accessibilty minor: Unable to open file error #002: ../../../src/H5FD.c in H5FD_open(): line 812: open failed class: HDF5 major: Virtual File Layer minor: Unable to initialize object error #003: ../../../src/H5FDsec2.c in H5FD_sec2_open(): line 348: unable to open file: name = '/ti/output.h5', errno = 112, error message = 'Host is down', flags = 13, o_flags = 242 class: HDF5 major: File accessibilty minor: Unable to open file Calls: %>% ... <Anonymous> -> <Anonymous> -> <Anonymous> -> H5File.open Execution halted sh: 0: getcwd() failed: No such file or directory rm: cannot remove '/tmp2/RtmpYrIFX3': Host is down Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>
SCUBA
Traceback (most recent call last): File "/code/run.py", line 56, in <module> min_percentage_split = p["min_percentage_split"]) File "/usr/local/lib/python3.7/site-packages/PySCUBA/SCUBA_core.py", line 105, in initialize_tree X = np.compress(condition, data, axis = 0) File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1896, in compress return _wrapfunc(a, 'compress', condition, axis=axis, out=out) File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc return getattr(obj, method)(*args, **kwds) ValueError: condition must be a 1-d array Traceback (most recent call last): File "/code/run.py", line 56, in <module> min_percentage_split = p["min_percentage_split"]) File "/usr/local/lib/python3.7/site-packages/PySCUBA/SCUBA_core.py", line 105, in initialize_tree X = np.compress(condition, data, axis = 0) File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1896, in compress return _wrapfunc(a, 'compress', condition, axis=axis, out=out) File "/usr/local/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc return getattr(obj, method)(*args, **kwds) ValueError: condition must be a 1-d array Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>
4) ...and the winners are: paga dimred.paga.pdf paga_tree (outperforms paga on my data by connecting all clusters by one trajectory) mst (no brachings but a quite meaningful curvature through all clusters) monocle_ICA (a big surprise, because THAT one provides least quality in the standalone version vs TSCAN, SLICER and co, see the Hemberg script above)
Overall I guess its all very dependent on the users data. One thing I did not like was the Container kills. Do those depend on the performance of the user PC or is it a Docker thing that users cannot manipulate? Because if I use my real data set being 10x larger I would expect some of the "good" Dynmethods to be killed as well, which would be very frustrating - but hope dies last...
Anyways, great work! On a scale from 1 to 10 I give you guys 11!! ;) If you have time and like this job, just keep adding tools, because the community likes playing with their data...
All the best, Stephan
UPDATE:
Previously I had 4 categories of results for various Dynmethods with my test set of 1.6k cells x ~10k genes. By now I ramped up the Docker settings to CPUs: 10 Memory: 44544MB Swap: 3072MB
with these device specs Processor: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz RAM: 64GB System Type: 64-bit Windows 10 Pro v1809
Now there are no more Docker kills! :)
Revised results with the test data set:
Uninformative results (low quality clustering, squished cell groups, contracted axis etc)
slingshot
Merlot
URD (takes forever and is a huge effort to plot due to dropouts)
Mpath (drops out >50% of input cells)
Cellrouter (very nice cyclic clustering but trajectory condensed to one point)
SLICER (no trajectory at all?!)
SLICE (no tree but curve, projects trajectory in space between clusters => totally unstructured dendrogram)
Sincell (finds >50 milestones and fragments dendrogram to chaotic pieces)
RaceID_StemID (>100 dropouts, fails in plot_dendro
by Error in density.default(y, n = nbins, adjust = adjust) : 'x' contains missing values
Tool-specific Error codes, probably depending on make-up of user datasets pcreode (see previous) ElPiGraph (see previous) SCUBA (see previous)
PAGA
Gives good trajectory and dimred but fails to plot dendrogram due to
Error in density.default(y, n = nbins, adjust = adjust) : 'x' contains missing values
monocle_DDRTree
Removing 61 outliers Warning messages: 1: In log(ifelse(y == 0, 1, y/mu)) : NaNs produced 2: step size truncated due to divergence 3: In log(ifelse(y == 0, 1, y/mu)) : NaNs produced 4: step size truncated due to divergence Error: sort(unique(c(cell_graph$from, cell_graph$to))) not equal to sort(names(to_keep)). Lengths differ: 123 is not 1654 Execution halted Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>
MFA
Error in prcomp.default(y, scale = TRUE) : cannot rescale a constant/zero column to unit variance Calls: <Anonymous> -> prcomp -> prcomp.default Execution halted Sampling for 1654 cells and 10441 genes Error in prcomp.default(y, scale = TRUE) : cannot rescale a constant/zero column to unit variance Calls: <Anonymous> -> prcomp -> prcomp.default Execution halted Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>
Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>`
GPfates
/usr/local/lib/python3.7/site-packages/GPfates/GPfates.py:60: FutureWarning:Method .as_matrix will be removed in a future version. Use .values instead. /usr/local/lib/python3.7/site-packages/GPfates/GPfates.py:34: FutureWarning:Method .as_matrix will be removed in a future version. Use .values instead. /usr/local/lib/python3.7/site-packages/GPfates/GPfates.py:77: FutureWarning:Method .as_matrix will be removed in a future version. Use .values instead. Traceback (most recent call last): File "/code/run.py", line 45, in <module> m.model_fates(C=end_n) File "/usr/local/lib/python3.7/site-packages/GPfates/GPfates.py", line 77, in model_fates self.fate_model = OMGP(self.s[[t]].as_matrix(), self.s[X].as_matrix(), K=C, prior_Z='DP') File "/usr/local/lib/python3.7/site-packages/paramz/parameterized.py", line 53, in __call__ self = super(ParametersChangedMeta, self).__call__(*args, **kw) File "/usr/local/lib/python3.7/site-packages/GPclust/OMGP.py", line 24, in __init__ for i in range(K): TypeError: 'float' object cannot be interpreted as an integer /usr/local/lib/python3.7/site-packages/GPfates/GPfates.py:60: FutureWarning:Method .as_matrix will be removed in a future version. Use .values instead. /usr/local/lib/python3.7/site-packages/GPfates/GPfates.py:34: FutureWarning:Method .as_matrix will be removed in a future version. Use .values instead. /usr/local/lib/python3.7/site-packages/GPfates/GPfates.py:77: FutureWarning:Method .as_matrix will be removed in a future version. Use .values instead. Traceback (most recent call last): File "/code/run.py", line 45, in <module> m.model_fates(C=end_n) File "/usr/local/lib/python3.7/site-packages/GPfates/GPfates.py", line 77, in model_fates self.fate_model = OMGP(self.s[[t]].as_matrix(), self.s[X].as_matrix(), K=C, prior_Z='DP') File "/usr/local/lib/python3.7/site-packages/paramz/parameterized.py", line 53, in __call__ self = super(ParametersChangedMeta, self).__call__(*args, **kw) File "/usr/local/lib/python3.7/site-packages/GPclust/OMGP.py", line 24, in __init__ for i in range(K): TypeError: 'float' object cannot be interpreted as an integer Error: Error during trajectory inference, see output above <U+2191><U+2191><U+2191>
So happy that there are many options to choose from!
more edits to come for full dataset runs
Hi Dynverse Team,
First and foremost, you created sth formidable here - I am a huge fan of your platform!! Up to now I could resolve a couple of issues by myself, but this one is beyond my horizon:
Here is my dataset:
Thanks in advance for any suggestions!