dynverse / dynmethods

A collection of 50+ trajectory inference methods within a common interface 📥📤
https://dynverse.org
Other
118 stars 26 forks source link

System command error for several methods #104

Closed cemalley closed 5 years ago

cemalley commented 6 years ago

Hi dynverse devs, wow! I am impressed with this project and the effort to make it easy to run many trajectory methods from the same prepared task. I have been testing dyno out with a scRNA dataset with 1385 cells and 13953 genes. I notice that several methods give a vague error: Error: Error during trajectory inference System command error

Methods that I tested and worked (finished the infer_trajectory):

Methods that gave the system command error:

The one that really "hurts most" for not working is Monocle DDRTree. Could you provide at least more verbose errors for the failure? If you would like I can post the full code I'm running. The only parameter changing across the tests was method name. Thank you, I know dyno is in active development.

rcannood commented 6 years ago

Hello Claire,

Thanks for trying out dyno :) Thanks for the nice words!

Edit: I saw @zouter responded simultaneously to this issue. I'm removing my previous suggestions to avoid conflicting advice.

Kind regards, Robrecht

zouter commented 6 years ago

Hi @cemalley

Thanks for testing out the package. I hope it helps you with finding good trajectories in your data!

We made some large changes the last week which actually broke almost every docker container. Probably the dockers which worked for you were those which were downloaded after this change. We also pushed a fix today which fixed the paga wrapper for datasets over 500 cells.

So the most probable cause of the error is that the docker containers are out of sync. Could you try running docker pull dynverse/monocle_ddrtree in the command line (or, alternatively, run dynwrap::pull_docker_ti_method("dynverse/monocle_ddrtree") in R? And see whether you can then run Monocle DDRTree method on the dataset?

If you want more verbose input, you can set verbose=TRUE in infer_trajectory. We decided to put this off by default because most methods are very verbose by default. I'm working on a fix to return the output/error of the method when it errors.

Thanks!

Wouter

rcannood commented 6 years ago

@zouter and I just discussed this issue. We will make dynmethods automatically update dockers if an old version is being used. I'll keep you posted.

cemalley commented 6 years ago

Hi @zouter and @rcannood thank you for the quick replies. I'm using the latest version of docker, dyno, and the monocle method.

Docker: 18.06.0-ce-mac70 (26399) R version 3.5.1 (2018-07-02) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.6 Dyno: 0.1.0 (forced reinstall) Status: Image is up to date for dynverse/monocle_ddrtree:latest

When I run with verbose=T there is no more clue to the reason: Loading required package (many) Removing 435 outliers Killed Error: Error during trajectory inference System command error

For projected_paga the errors relate to HDf5-API. Here is the first error: HDF5-API Errors: error #000: ../../src/hdf5-1.10.0-1/src/H5A.c in H5Acreate2(): line 281: unable to create attribute class: HDF5 major: Attribute minor: Unable to initialize object

Maybe it is a memory issue, though I can run standalone monocle with no issues. Thanks for your help.

zouter commented 6 years ago

Hi @cemalley

First, thank you for you patience. You're one of the first people using dyno for real use cases so there are some issues still in there..

For Monocle: Indeed, this is probably related to a memory issue. When I ran Monocle on a similarly sized dataset it also uses quite a bit of memory (at times 6GB). I had a look and this is probably due to differences in feature filtering. Our wrapper did not do any feature filtering (this wrapper was originally used for evaluating the methods, where for being fair for all methods we already filtered the genes prior to trajectory inference). I guess you did select some features for ordering as suggested in the tutorial?

In any case, I now updated all monocle wrappers to include feature filtering by default, using the one suggested in the monocle documentation:

  disp_table <- dispersionTable(cds)
  ordering_genes <- subset(disp_table, mean_expression >= params$filter_features_mean_expression)
  cds <- setOrderingFilter(cds, ordering_genes)

You can tune this using the parameters filter_features (logical) and filter_features_mean_expression (numeric).

For PAGA, this is an issue which we fixed recently with HDF5.

We recently made some updates to dynwrap and dynmethods to make sure the latest versions of the containers are always used. So I highly suggest updating dynmethods (dynwrap will also be updated): devtools::install_github("dynverse/dynmethods"). Then again running monocle or PAGA should automatically update the docker for you, and the errors should be fixed :crossed_fingers:

cemalley commented 6 years ago

Hi @zouter, I am up to date with the devel branch of dyno and dynmethods. I try both with cell_info and without in the task, but monocle still fails at the dimensionality reduction step. This is definitely the most intensive step in monocle. I was going to try to run this on NIH Biowulf with near-unlimited memory, but Docker is incompatible with the system. I made a markdown file of the local run, see this file I uploaded: http://htmlpreview.github.io/?https://github.com/cemalley/scRNASeq-bulkRNASeq/blob/master/dyno-example.htm I hope you can see the page. The data are a day 0 to 7 iPSC differentiation into neuroprogenitor cells, so the cell_info is day labeling. Thank you.

Also, my machine is 64 bit with 32GB memory. I read that RStudio will use as much memory as it wants in this configuration, these days.

rcannood commented 5 years ago

Sorry, I lost track of this issue for a while. I'm not quite sure what the problem here is. Can you replicate the same problem on a different computer? Alternatively, you could remove the gene and cell names from your dataset and send it to us so we could have a look.

cemalley commented 5 years ago

Hi, I found Monocle DDRTree method worked for up to 8200 genes in the input dataset (constant 1385 cells). What is your email address?

rcannood commented 5 years ago

My email address is @gmail.com :)

rcannood commented 5 years ago

Interestingly, that sortof matches our results as well... On the left side, you can see a dataset sub- and super-sampled at various sizes. On a computer with 16GB of memory, it runs out of memory (orange) at around 10 features and ~9000 cells (bottom right), or at ~13000 features and 10 cells (top left)

rcannood commented 5 years ago

What. I'm a bit confused about what I wrote earlier. Before the 'at' symbol, I meant to write 'rcannood'. Just giving you the ending of an email address is a bit silly.

cemalley commented 5 years ago

Hi there, I don't think I can share my data at this time, but I will let you know if I get permission. Thank you. I am still watching the progress of this project and eagerly await any publication updates.

zouter commented 5 years ago

Closing this for now, but feel free to open a new issue if you still have issues!