cole-trapnell-lab / cicero-release

https://cole-trapnell-lab.github.io/cicero-release/
MIT License
56 stars 14 forks source link

Updates to make_cicero_cds() to increase speed. #6

Closed hypercompetent closed 5 years ago

hypercompetent commented 6 years ago

Hi Hannah and the CICERO team,

Thanks for this great tool! We're using it to dig into our scATAC-seq data from mouse VISp.

This pull has updates to make_cicero_cds() that I think will speed up this step of the process - mostly by keeping things as numeric/matrix as much as possible to reduce type switching.

Below is timing using the original version (make_cicero_cds() ) compared to this version (renamed faster_cicero_cds() ) for my data using chr18.

Also below are the results from the runCicero tests, though I don't know the cds system super well, so I'm not 100% sure this doesn't break anything downstream. If there's something that this breaks, please let me know, and I'll modify and recommit to this branch.

Cheers, -Lucas Graybuck

Simple benchmark

> # make cicero cds
>   set.seed(2018)
>   start_time <- Sys.time()
>   cicero_cds <- make_cicero_cds(input_cds, 
+                                 reduced_coordinates = tsne_coords,
+                                 k = 20)
Overlap QC metrics:
Cells per bin: 20
Maximum shared cells bin-bin: 17
Mean shared cells bin-bin: 0.181565206801541
Median shared cells bin-bin: 0
Removing 306 outliers
>   end_time <- Sys.time()
>   difftime(end_time, start_time)
Time difference of 16.1512 mins
>   
>   source("faster_cicero_cds.R")
>   set.seed(2018)
>   start_time <- Sys.time()
>   cicero_cds <- faster_cicero_cds(input_cds, 
+                                 reduced_coordinates = tsne_coords,
+                                 k = 20)
Overlap QC metrics:
Cells per bin: 20
Maximum shared cells bin-bin: 17
Mean shared cells bin-bin: 0.181565206801541
Median shared cells bin-bin: 0
Removing 306 outliers
>   end_time <- Sys.time()
>   difftime(end_time, start_time)
Time difference of 2.781479 mins

Testing

> devtools::test(filter = "runCicero")

...Package loading messages ommitted...

Testing cicero
√ | OK F W S | Context
/ | 28       | runCicero[1] "Successful cicero models:  283"
[1] "Other models: "

Zero or one element in range 
                          30 
[1] "Models with errors:  0"
\ | 46       | runCicero[1] "Coaccessibility cutoff used: 0.25"
√ | 78       | runCicero [92.7 s]

== Results =====================================================================
Duration: 92.8 s

OK:       78
Failed:   0
Warnings: 0
Skipped:  0