dynverse / dyno

Inferring, interpreting and visualising trajectories using a streamlined set of packages 🦕
https://dynverse.github.io/dyno
Other
166 stars 32 forks source link

Error: "cell_id" %in% colnames(end_state_probabilities) isn't true. #30

Closed liuyifang closed 5 years ago

liuyifang commented 5 years ago

Hi,

When I use the sample data and try the grandprix method, I meet this error:

library(dyno) library(tidyverse) data("fibroblast_reprogramming_treutlein") dataset <- wrap_expression( counts = fibroblast_reprogramming_treutlein$counts, expression = fibroblast_reprogramming_treutlein$expression ) %>% add_prior_information( end_n = 1 ) model <- infer_trajectory(dataset = dataset, method = ti_grandprix(), verbose = TRUE) Executing 'grandprix' on '20181104_034735__data_wrapper__dwq1Aondcn' With parameters: list() And inputs: expression, end_n Input saved to /tmp/Rtmp1ghIhj/file10af56aa4118/ti/input: end_n.json expression.csv params.json Running /usr/bin/docker run -e 'TMPDIR=/tmp2' --workdir /ti/workspace -v \ '/tmp/Rtmp1ghIhj/file10af56aa4118/ti:/ti' -v \ '/tmp/Rtmp1ghIhj/file10af8de69b4/tmp:/tmp2' dynverse/ti_grandprix 2018-11-04 03:47:44.038317: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA output saved in /tmp/Rtmp1ghIhj/file10af56aa4118/ti/output: cell_ids.csv end_state_probabilities.csv pseudotime.csv timings.json Error traceback: 2: "cell_id" %in% colnames(end_state_probabilities) isn't true. 1: Error: Error during trajectory inference "cell_id" %in% colnames(end_state_probabilities) isn't true.

And I have another two questions: 1, Could you please explain me more about the parameter end_n? 2, I want to add timecourse information to the grandprix method, but I don't know what the timecourse table look like. Could you please generate a sample timecourse table? Thanks.

Yifang

rcannood commented 5 years ago

Hello Yifang,

Thanks for using the dyno package! :)

The error you are getting is related to your other question. Similar to STEMNET, FateID, MFA, GPfates, and SCOUP, GrandPrix performs a pseudotime estimation where it assumes that there are multiple end states. It will try to group cells according to which end state it is going towards. However, you will need to tell the method how many end states (end_n) there are. For instance:

model <- infer_trajectory(dataset = dataset %>% add_prior_information(end_n = 2), method = ti_grandprix(), verbose = TRUE)
dynplot::plot_graph(model)

endn2

The error you are receiving is because end_n should be larger than one, for GrandPrix. I will revise the docker wrapper to provide a more useful error message.

I'm also noticing that when I pass GrandPrix end_n == 3, it is giving me more than 3 end states. @zouter or I will have a look at this.

In order to pass along timecourse information, you can pass along the timecourse information as a named vector. In this example, I just generated random timecourse data:

library(dyno)
library(tidyverse)
data("fibroblast_reprogramming_treutlein")
dataset <- wrap_expression(
  counts = fibroblast_reprogramming_treutlein$counts,
  expression = fibroblast_reprogramming_treutlein$expression
) %>% add_prior_information(
  end_n = 2, 
  timecourse_continuous = set_names(runif(nrow(fibroblast_reprogramming_treutlein$counts)), fibroblast_reprogramming_treutlein$cell_ids)
)

model <- infer_trajectory(
  dataset = dataset,
  method = ti_grandprix(),
  verbose = TRUE, 
  give_priors = c("end_n", "timecourse_continuous")
)

You can see that the extra data has been passed to GrandPrix:

Executing 'grandprix' on '20181104_144529__data_wrapper__eM1PWe5OLs'
With parameters: list()
And inputs: expression, end_n, timecourse_continuous
Input saved to /tmp/RtmpJYTEhm/file39883cf5b0d7/ti/input: 
    end_n.json
    expression.csv
    params.json
    timecourse_continuous.json
Running /usr/bin/docker run -e 'TMPDIR=/tmp2' --workdir /ti/workspace -v '/tmp/RtmpJYTEhm/file39883cf5b0d7/ti:/ti' -v '/tmp/RtmpJYTEhm/file3988313e9755/tmp:/tmp2' dynverse/ti_grandprix
2018-11-04 13:45:45.126849: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
output saved in /tmp/RtmpJYTEhm/file39883cf5b0d7/ti/output: 
    cell_ids.csv
    end_state_probabilities.csv
    pseudotime.csv
    timings.json
liuyifang commented 5 years ago

Hi Robrecht,

Thank you very much for the explanation about the end_n and timecourse. I am clear now. I have another question relate with timecourse. I read your pre-print paper and the Table 1 said that scuba also support timecourse. But from the code below, the timecourse is not optional for scuba:

library(dyno)
library(tidyverse)
data("methods", package = "dynmethods")
methods %>% select(name, input, output)
methods2 <- 
  methods %>% 
  mutate(
    required = map_chr(input, ~paste0(.$required, collapse = ", ")),
    optional = map_chr(input, ~paste0(.$optional, collapse = ", "))
  ) %>% 
  select(id, name, required, optional)
methods2 %>% filter(grepl("scuba", id))
# A tibble: 1 x 4
  id    name  required   optional
  <chr> <chr> <chr>      <chr>   
1 scuba SCUBA expression ""      

So, my question is whether scuba support timecourse information by using your dyno package? Thanks.

Yifang

zouter commented 5 years ago

Hi At the moment this is not yet included in the wrapper. We might add this in the future, but this is low priority at the moment. Sorry!

Wouter