dynverse / dyno

Inferring, interpreting and visualising trajectories using a streamlined set of packages 🦕
https://dynverse.github.io/dyno
Other
169 stars 32 forks source link

Error: Error during trajectory inference any(duplicated(c(cell_ids, milestone_ids))) isn't false. #28

Closed givison closed 5 years ago

givison commented 6 years ago

I'm trying to run projected_paga on my dataset and get this error:

Error: Error during trajectory inference any(duplicated(c(cell_ids, milestone_ids))) isn't false.

I can't figure out where milestone_ids comes from.

wrapped_data <- wrap_expression( counts = param_ion_counts, expression = param_asinh_expression )

any(duplicated(wrapped_data$cell_ids))

evaluates to FALSE

I can't share the dataset publicly, but if you can't reproduce this error I can probably email the dataset to you.

Best, Geoff

rcannood commented 6 years ago

Hello Geoff!

Thanks for bringing this up. In each of the methods that we wrapped, at some point, names are assigned to the milestones. In some scripts, these names are milestone1, milestone2, milestone3; in others these are M1, M2, etc, but in the projected paga script, they are just being called 1, 2, 3, etc.

Either Wouter or I will push a fix for this in the coming days. Could you verify whether the following at least solves your problem, for now?

rownames(param_ion_counts) <- rownames(param_asinh_expression) <- paste0("Cell", seq_len(nrow(param_ion_counts)))

Kind regards, Robrecht

givison commented 6 years ago

That does fix this issue, thank you!

New problems trying to continue my analysis:

Running model %<>% add_dimred(dyndimred::dimred_mds, expression_source = wrapped_data$expression) I get Error: vector memory exhausted (limit reached?)

Model is about 8 mb, might this be the problem? I'm trying to analyze CyTOF data, so I have approx 19k cells and 18 parameters of interest. Works fine when I use dyndimred::dimred_pca, though the projection is unfortunately not too informative...

The plotting window also seems to choke when I try running plot_heatmap(), the window just goes white and the heatmap doesn't appear, even after waiting several minutes. I also had this problem with the tutorial, when trying to create heatmaps past the first.

I've also tried running projected_gng and I get the error

Error: all(method_ids %in% ti_methods$id | grepl("/", method_ids)) isn't true.

Sorry if I'm overloading this issue, would it be more helpful to create separate issues?

Thanks again! Geoff

zouter commented 6 years ago

Hi Geoff

The dimensionality reduction problem is probably due to normal MDS not able to handle a large number of cells (although 18k isn't that much of course). MDS first calculates a distance matrix between all pairs of cells, which means 18k x 18k, which is around 2.6 GB of RAM. For your data, we would suggest to use dyndimred::dimred_landmark_mds, it uses some heuristics so it only calculates the dimensionality reduction between a set of landmark cells and all cells.

For the heatmap, this is unfortunately a bug in patchwork (https://github.com/thomasp85/patchwork)... Assuming you use RStudio, the heatmap will show up fine if you click on "zoom". Or you could save the heatmap to a pdf. As far as I know, this only affects the plotting inside RStudio itself.

Let me know if these things help! For me, it is fine to continue in this issue btw :+1:

Wouter

givison commented 6 years ago

Thank you for the tips, both were helpful! I realized the heatmap also shows up when I run it as a code chunk in a markdown file. Seems like projected_paga isn't a great choice for this dataset, good to know for sure. Been trying to run slingshot but it's taking hours to run, I might try running it overnight...

Any idea what might be causing the issue I saw above with projected_gng?

Thanks! Geoff

rcannood commented 6 years ago

The problem with "projected_gng" is easy to solve! We renamed it to simply "gng" at some point ─ my bad. Try running it with method = "gng" or method = ti_gng().

Slingshot, right now, indeed does not scale very well with respect to the number of cells. Although, if I try to interpret the colour scale, it seems like it should only take about 10^3 seconds to run.

givison commented 6 years ago

Yep, method = "gng" works! The guidelines GUI gave me "projected_gng" still so I'll leave this issue open for now.

I did manage to run slingshot overnight, but it wasn't able to pick up the bifurcation in my dataset. Would be very interesting to see benchmarking on CyTOF data!

zouter commented 5 years ago

This is now fixed in the latest dynguidelines, which also includes the latest results from the benchmark! From now on, we will always let the guidelines app depend on the correct package version of the methods package, as to avoid problems like you had. Thanks for the feedback, and feel free to try it out!

It would be indeed great to do some benchmarking on CyTOF data as well, perhaps for a future version of the benchmark :smile: . In the scalability analysis we did include some datasets which are in the order of magnitude of CyTOF data though (around 100 features and 100k cells).

I guess it's ok to close this now.

givison commented 5 years ago

Thanks for the fixes!

I wonder if the noise characteristics are different enough in cytof data that the benchmarking doesn't predict the best algorithms as effectively? Or perhaps I just had bad luck with the one's I ended up trying out. In any case, an excellent package and very helpful in organizing the complicated world of trajectory inference

On Thu, Nov 15, 2018, 9:00 AM Wouter Saelens <notifications@github.com wrote:

Closed #28 https://github.com/dynverse/dyno/issues/28.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/dynverse/dyno/issues/28#event-1968717900, or mute the thread https://github.com/notifications/unsubscribe-auth/AGdhMEBdE-chnmGNOIP3lksS6k2lkyXHks5uvXOMgaJpZM4XYr5P .

zouter commented 5 years ago

Hi Geoff

It's kind of difficult to say because their could be many factors involved. Most of the methods you tried use the raw counts and process/normalise it in some way, usually in a different way than would be typical for cytometry data.

There are some methods which should also work on cytof data (wishbone for example). When wrapping each method, we kind of ignored this as our goal in the evaluation was to work on RNA-seq data. Perhaps in the future we could adapt these wrappers to also work with other data types, I created an issue for this at dynverse/dynmethods#145