aicenter / GroupAD.jl

Generative Anomaly Detection for Multiple Instance Learning problems.
0 stars 2 forks source link

Fix the awfully slow BagNode reindexing #7

Closed vitskvara closed 3 years ago

vitskvara commented 3 years ago
function reindex(bagnode, inds)
    obs_inds = bagnode.bags[inds]
    new_bagids = vcat(map(x->repeat([x[1]], length(x[2])), enumerate(obs_inds))...)
    data = bagnode.data.data[:,vcat(obs_inds...)]
    new_bags = GroupAD.seqids2bags(new_bagids)
    BagNode(ArrayNode(data), new_bags)
end
@time bg1=bagnode[inds]
 19.818230 seconds (3.19 M allocations: 59.049 GiB, 4.15% gc time)
BagNode with 10000 obs
  └── ArrayNode(3×1582492 Array, Float64) with 1582492 obs

julia> @time bg2=reindex(bagnode, inds)
  0.646388 seconds (461.48 k allocations: 123.225 MiB, 2.69% gc time)
BagNode with 10000 obs
  └── ArrayNode(3×1582492 Array, Float64) with 1582492 obs
vitskvara commented 3 years ago

In the example, I am trying to index with an Array inds of length 10,000.

vitskvara commented 3 years ago

But is it going to work with even larger data? I have a suspicion that vcat fails for too many arguments.

vitskvara commented 3 years ago

So I tested it with up to 5 million indices, so it seems fine.

vitskvara commented 3 years ago

Solved with https://github.com/aicenter/GroupAD.jl/pull/6/commits/9c26330c20c28ccb03be73fad9d5acd01d1e8e71