IndexError running paga or paga_tree

naddsch commented 5 years ago

Hi there,

I am trying to use your package on flow cytometry data using an examplary dataset of ~30.000 cells and 5 dimensions with a pre-set start_id. The fluorescence data is wrapped into counts and expression of a dataset.

Unfortunately when running

model_paga_tree <- infer_trajectory(dataset, ti_paga(), verbose = TRUE)

I get the following error

Executing 'paga' on '20190627_145358__data_wrapper__8U7MT2djO2'
With parameters: list(n_neighbors = 15L, n_comps = 50L, n_dcs = 15L, resolution = 1L,     embedding_type = "fa", connectivity_cutoff = 0.05),
inputs: counts, and
priors : start_id
Input saved to C:\Users\...\AppData\Local\Temp\RtmpagjEd1\file28381f963609/ti
Running method using babelwhale
Running "C:\PROGRA~1\DOCKER~1\docker.exe" run -e "TMPDIR=/tmp2" --workdir /ti/workspace -v "/c/Users/.../AppData/Local/Temp/RtmpagjEd1/file28381f963609/ti:/ti" -v \
  "/c/Users/.../AppData/Local/Temp/RtmpagjEd1/file2838ad45788/tmp:/tmp2" "dynverse/ti_paga:v0.9.9.04" --dataset /ti/input.h5 --output /ti/output.h5 --use_priors all
/usr/local/lib/python3.7/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py:168: RuntimeWarning: invalid value encountered in true_divide
  / disp_mad_bin[df['mean_bin'].values].values
Traceback (most recent call last):
  File "/code/run.py", line 55, in <module>
    sc.pp.recipe_zheng17(adata, n_top_genes=n_top_genes)
  File "/usr/local/lib/python3.7/site-packages/scanpy/preprocessing/_recipes.py", line 108, in recipe_zheng17
    adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False)
  File "/usr/local/lib/python3.7/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 175, in filter_genes_dispersion
    disp_cut_off = dispersion_norm[n_top_genes-1]
IndexError: index 4 is out of bounds for axis 0 with size 0
Error: Error during trajectory inference 
/usr/local/lib/python3.7/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py:168: RuntimeWarning: invalid value encountered in true_divide
  / disp_mad_bin[df['mean_bin'].values].values
Traceback (most recent call last):
  File "/code/run.py", line 55, in <module>
    sc.pp.recipe_zheng17(adata, n_top_genes=n_top_genes)
  File "/usr/local/lib/python3.7/site-packages/scanpy/preprocessing/_recipes.py", line 108, in recipe_zheng17
    adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False)
  File "/usr/local/lib/python3.7/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 175, in filter_genes_dispersion
    disp_cut_off = dispersion_norm[n_top_genes-1]
IndexError: index 4 is out of bounds for axis 0 with size 0

Exactly the same is happening with paga_tree. Can you tell me what's going wrong?

Thanks and best, Laura

zouter commented 5 years ago

This is a bug somewhere in the preprocessing of scanpy. I guess somewhere genes are removed before the n_top_genes are selected, meaning that less than n_top_genes are available and thus this out of bounds error.

Quite easy to reproduce: dataset <- dyntoy::generate_dataset(num_features = 10)

Working on a solution now :slightly_smiling_face:

zouter commented 5 years ago

dynverse/ti_paga:v0.9.9.05 now has a parameter filter_features. If FALSE, this will skip the filtering. As far as I know, this is only necessary when there are < 100 features, such as for cytometry data.

PAGA Tree and Projeceted PAGA are now also building on travis, once that's done these changes will be merged into dynmethods master.

Thanks Laura, have fun inferring trajectories :wink:

naddsch commented 5 years ago

Thanks a lot! :blush:

I ran paga now with filter_features = FALSE, but when I try to plot the trajectory afterwards, it ends up in the following error:

> model_paga <- infer_trajectory(dataset, "dynverse/ti_paga:v0.9.9.05", filter_features = FALSE,  verbose = TRUE)
> plot_dimred(model_paga)
Coloring by milestone
Using milestone_percentages from trajectory

Error: Column `arrow` must be length 1 (the group size), not 0 

Traceback:
14. stop(structure(list(message = "Column `arrow` must be length 1 (the group size), not 0", 
    call = NULL, cppstack = NULL), class = c("Rcpp::exception", 
"C++Error", "error", "condition"))) 
13. mutate_impl(.data, dots, caller_env()) 
12. mutate.tbl_df(., distance_to_center = (comp_1_from - mean(c(max(comp_1_from), 
    min(comp_1_from))))^2 + (comp_2_from - mean(c(max(comp_2_from), 
    min(comp_2_from))))^2, arrow = row_number() == which.min(distance_to_center)) 
11. mutate(., distance_to_center = (comp_1_from - mean(c(max(comp_1_from), 
    min(comp_1_from))))^2 + (comp_2_from - mean(c(max(comp_2_from), 
    min(comp_2_from))))^2, arrow = row_number() == which.min(distance_to_center)) 
10. function_list[[k]](value) 
9. withVisible(function_list[[k]](value)) 
8. freduce(value, `_function_list`) 
7. `_fseq`(`_lhs`) 
6. eval(quote(`_fseq`(`_lhs`)), env, env) 
5. eval(quote(`_fseq`(`_lhs`)), env, env) 
4. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
3. waypoint_edges %>% group_by(from_milestone_id, to_milestone_id) %>% 
    mutate(distance_to_center = (comp_1_from - mean(c(max(comp_1_from), 
        min(comp_1_from))))^2 + (comp_2_from - mean(c(max(comp_2_from), 
        min(comp_2_from))))^2, arrow = row_number() == which.min(distance_to_center)) 
2. project_waypoints(trajectory = trajectory, cell_positions = cell_positions, 
    waypoints = waypoints, trajectory_projection_sd = trajectory_projection_sd, 
    color_trajectory = color_trajectory) 
1. plot_dimred(model_paga)

This is also happening for a 1000x1000 toy example as long as filter_features = FALSE...

Best, Laura

rcannood commented 4 years ago

Oh dear.

Could you perhaps send us the model_paga object as an rds, so I can try plotting it myself to see what goes wrong?

Robrecht

dynverse / dyno

IndexError running paga or paga_tree #56