MurrellGroup / MolecularEvolution.jl

A Julia framework for developing phylogenetic models
MIT License
9 stars 4 forks source link

make sim_tree faster for large N #5

Closed bicycle1885 closed 1 year ago

bicycle1885 commented 1 year ago

This optimizes the sim_tree function for large N. The proposed implementation uses a Set object to store nodes while the original one uses an array, which is slow to delete random elements because the complexity is linear to the number of elements. Please note that a few minor cleanups are also included in this change, such as removing unused variables.

main:

julia> @benchmark sim_tree(n = 100_000)
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  2.936 s …  2.950 s  ┊ GC (min … max): 1.83% … 2.44%
 Time  (median):     2.943 s             ┊ GC (median):    2.13%
 Time  (mean ± σ):   2.943 s ± 9.674 ms  ┊ GC (mean ± σ):  2.13% ± 0.43%

  █                                                      █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.94 s        Histogram: frequency by time        2.95 s <

 Memory estimate: 136.45 MiB, allocs estimate: 3792429.

this pull request:

julia> @benchmark sim_tree(n = 100_000)
BenchmarkTools.Trial: 18 samples with 1 evaluation.
 Range (min … max):  235.876 ms … 380.271 ms  ┊ GC (min … max): 19.88% … 49.79%
 Time  (median):     283.760 ms               ┊ GC (median):    33.86%
 Time  (mean ± σ):   288.197 ms ±  45.947 ms  ┊ GC (mean ± σ):  34.51% ± 10.18%

  █        ▃                   ▃  ▃                              
  █▇▇▁▁▁▁▁▇█▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▇▁▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▇▇▇▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  236 ms           Histogram: frequency by time          380 ms <

 Memory estimate: 149.99 MiB, allocs estimate: 4299296.