Closed mdkeehan closed 3 years ago
I have done my homework and would like to contribute a rewritten loop.
using BenchmarkTools
#
# this is line 16 of groups.jl
#
function extract_group_attributes(v)
group_labels = collect(unique(sort(v)))
n = length(group_labels)
if n > 100
@warn("You created n=$n groups... Is that intended?")
end
group_indices = Vector{Int}[filter(i -> v[i] == glab, eachindex(v)) for glab in group_labels]
# parts omitted...
end
#
# here is a redesigned loop.
#
function extract_group_attributes2(v)
res = Dict{eltype(v),Vector{Int}}()
for (i,label) in enumerate(v)
if haskey(res,label)
push!(res[label],i)
else
res[label] = [i]
end
end
group_indices = [ res[i] for i in sort(collect(keys(res)))]
end
d1= [ "C","C","C","A", "A", "A","B","B","D"]
res1 = @benchmark extract_group_attributes(d1);
res2 = @benchmark extract_group_attributes2(d1);
d1= [ "xx"*"$(i%599)" for i in 1:10000]
res1 = @benchmark extract_group_attributes(d1);
res2 = @benchmark extract_group_attributes2(d1);
m1=median(res1)
m2=median(res2)
judge(m2,m1)
I received this warning while plotting. My waiting for the plots I decided to try and investigate the code.
I have 28657 groups for 99845 series. Think of little particles moving in a time stream... I know my requirement is a bit extreme...
Is line 16 in group.jl
an O(G x S) operation i.e. something that will require 28657x99845 operations to complete? I wonder if it could be rewritten.