PoisotLab / EcologicalNetworks.jl

Everything you've never dreamed about measuring on ecological networks.
Other
68 stars 20 forks source link

Improved performance of function lp #188

Closed Inazuma110 closed 3 years ago

Inazuma110 commented 3 years ago

Improved performance of function lp

Improved the performance of function lp. In this function, the following two factors had a significant impact on the performance.

The first is to retrieve the adjacent vertex each time for each vertex.

linked = filter(s2 -> has_interaction(N,s1,s2), species(N; dims=2))

When the number of vertices is N, the computational complexity of this operation is O(N). In the original code, this operation is performed on all vertices, so the computational complexity is O(N^2), which is very slow. In this fix, adjacent vertices are now precomputed and retained.

The other is the format of the counts variable.

counts = StatsBase.counts(labels)

Before the fix, the function Stats.counts was used to counting the occurrences of labels. However, if the neighboring labels are far apart in value, there will be many zeroes in the array. (For example, applying the Stats.counts function to [1, 100] would result in [1, 0, ... , 0, 1], resulting in 98 useless elements.) This information is not only useless but also computationally expensive when knowing the most occurring labels. In this fix, the information of non-adjacent labels is not retained.

(Also, in the Stats.counts function, if the label value does not start from 1, the return array will start from the smallest label value. (For example. If the label is [2], the Stats.count function will return [1] instead of [0, 1].) This was causing a bug, which has also been fixed.)

I tried to benchmark before and after the change.

Before

julia> N = convert(BipartiteNetwork, web_of_life("M_PL_015"))
666×131 (String) bipartite ecological network (L: 2933 - Bool)

julia> @benchmark lp(N)
BenchmarkTools.Trial: 
  memory estimate:  13.05 MiB
  allocs estimate:  55064
  --------------
  minimum time:     940.893 ms (0.00% GC)
  median time:      2.017 s (0.00% GC)
  mean time:        2.298 s (0.11% GC)
  maximum time:     3.937 s (0.19% GC)
  --------------
  samples:          3
  evals/sample:     1

After

julia> @benchmark lp(N)
BenchmarkTools.Trial: 
  memory estimate:  12.42 MiB
  allocs estimate:  165736
  --------------
  minimum time:     27.190 ms (0.00% GC)
  median time:      39.667 ms (0.00% GC)
  mean time:        43.406 ms (2.12% GC)
  maximum time:     98.638 ms (0.00% GC)
  --------------
  samples:          116
  evals/sample:     1

Related issues

Checklist

I did not make any changes to the tests and document, because I did not change the function behavior.

Pinging

Pinging @tpoisot

tpoisot commented 3 years ago

Really nice -- I made a few changes internally to use the functions already in the package, it barely changes the performance (a little less memory is needed). I'll merge - thanks for the great contributions :tada: !