When and where to add biological data?

majorkazer commented 6 years ago

Given that several of the methods available through dyno require root cells, or can make use of time point data, it is unclear at what step you add these to the model. Should these always be added after running infer_trajectories? Is there a key to know which methods require what data a priori?

rcannood commented 6 years ago

Hello @majorkazer,

I recommend updating dyno and all its dependencies. I made a few changes to dynwrap so that it would print which prior information is being used.

devtools::install_github("dynverse/dyno", force = TRUE, dependencies = TRUE)

Do the explanations provided below provide an answer to your questions?

Kind regards, Robrecht

List each method's prior inputs

Is there a key to know which methods require what data a priori?

Yes there is, although it should be presented a bit more nicely. The dynmethods::methods object contains various information on all TI methods currently implemented in dyno. However, the inputs are list objects, and need to be processed first in order to make it more legible.

library(dyno)
library(tidyverse)

data("methods", package = "dynmethods")
methods %>% select(name, input, output)

# A tibble: 57 x 3
   name                 input      output    
   <chr>                <list>     <list>    
 1 Angle                <list [2]> <list [2]>
 2 CALISTA              <list [2]> <list [2]>
 3 CellRouter           <list [2]> <list [2]>
 4 CellTrails           <list [2]> <list [2]>
 5 cellTree with gibbs  <list [3]> <list [2]>
 6 cellTree with maptpx <list [3]> <list [2]>
 7 cellTree with vem    <list [3]> <list [2]>
 8 Component 1          <list [2]> <list [2]>
 9 DPT                  <list [3]> <list [2]>
10 ElPiGraph cycle      <list [2]> <list [2]>
# ... with 47 more rows

A method can mark prior information either as 'required' or as 'optional'. I extract this information as follows:


methods2 <- 
  methods %>% 
  mutate(
    required = map_chr(input, ~paste0(.$required, collapse = ", ")),
    optional = map_chr(input, ~paste0(.$optional, collapse = ", "))
  ) %>% 
  select(id, name, required, optional)
methods2

# A tibble: 57 x 4
   id              name                 required         optional             
   <chr>           <chr>                <chr>            <chr>                
 1 angle           Angle                expression       ""                   
 2 calista         CALISTA              expression       ""                   
 3 cellrouter      CellRouter           counts, start_id ""                   
 4 celltrails      CellTrails           expression       ""                   
 5 celltree_gibbs  cellTree with gibbs  expression       start_id, groups_id  
 6 celltree_maptpx cellTree with maptpx expression       start_id, groups_id  
 7 celltree_vem    cellTree with vem    expression       start_id, groups_id  
 8 comp1           Component 1          expression       ""                   
 9 dpt             DPT                  expression       start_id, features_id
10 elpicycle       ElPiGraph cycle      expression       ""

Add prior data to a dynwrap dataset

It is unclear at what step you add these to the model.

You can add prior information using dynwrap, before the trajectory inference. I will be using an example dataset from the SCORPIUS package for this:

data("ginhoux", package = "SCORPIUS")

dataset <- 
  wrap_data(
    id = "ginhoux",
    cell_ids = rownames(ginhoux$expression)
  ) %>% 
  add_expression(
    counts = round(2 ^ ginhoux$expression - 1), # no counts data is available
    expression = ginhoux$expression
  ) %>% 
  add_prior_information(
    start_id = "SRR1558845"
  )

Infer a trajectory, making use of prior data

You can check which methods require a start cell as follows:

methods2 %>% filter(grepl("start_id", required))

# A tibble: 9 x 4
  id         name       required                                optional           
  <chr>      <chr>      <chr>                                   <chr>              
1 cellrouter CellRouter counts, start_id                        ""                 
2 fateid     FateID     expression, end_id, start_id, groups_id ""                 
3 paga       PAGA       counts, start_id                        groups_id          
4 scoup      SCOUP      expression, groups_id, start_id, end_n  ""                 
5 slicer     SLICER     expression, start_id                    features_id, end_id
6 topslam    topslam    expression, start_id                    ""                 
7 urd        URD        counts, start_id                        ""                 
8 wanderlust Wanderlust counts, start_id                        features_id        
9 wishbone   Wishbone   counts, start_id                        features_id, end_n

For example, we can run PAGA to infer a trajectory:

traj1 <- infer_trajectory(dataset = dataset, method = ti_paga(), verbose = TRUE)
plot_dimred(traj1)

Executing 'paga' on 'ginhoux'
With parameters: list()
And inputs: counts, start_id
...

Alternatively, you could use one of the methods which can optionally use a certain prior information:

methods2 %>% filter(grepl("start_id", optional))

# A tibble: 8 x 4
  id                  name                 required          optional             
  <chr>               <chr>                <chr>             <chr>                
1 celltree_gibbs      cellTree with gibbs  expression        start_id, groups_id  
2 celltree_maptpx     cellTree with maptpx expression        start_id, groups_id  
3 celltree_vem        cellTree with vem    expression        start_id, groups_id  
4 dpt                 DPT                  expression        start_id, features_id
5 merlot              MERLoT               expression, end_n start_id             
6 projected_dpt       Projected DPT        expression        start_id, features_id
7 projected_slingshot Projected Slingshot  counts            start_id, end_id     
8 slingshot           Slingshot            counts            start_id, end_id

In this scenario, you will have to specify that you wish to use extra prior information:

traj2 <- infer_trajectory(dataset = dataset, method = ti_slingshot(), verbose = TRUE)
plot_dimred(traj2)

Executing 'slingshot' on 'ginhoux'
With parameters: list()
And inputs: counts
...

traj3 <- infer_trajectory(dataset = dataset, method = ti_slingshot(), give_priors = "start_id", verbose = TRUE)
plot_dimred(traj3)

Executing 'slingshot' on 'ginhoux'
With parameters: list()
And inputs: counts, start_id
...

majorkazer commented 6 years ago

Dear rcannood,

Thank you for the detailed response! These notes are quite helpful.

I am having another issue with hdf5, would you like me to start a new issue?

Sam

rcannood commented 6 years ago

Glad I could be of help :)

Perhaps it would indeed be best to start a new issue, in case someone else has a similar issue at some point.

givison commented 6 years ago

I'd like to select some marker genes from my dataset to run trajectory inference on. Is there any way to do this after wrapping? My workaround so far has been to just select the marker columns from my counts and expression matrices prior to wrapping.

dynverse / dyno