EventQueue ABM v3 - Githubissues

Datseris commented 8 months ago

Closes #760 Closes #902

Alright, this is pretty cool. To review this PR read the following three things:

The docstring of StandardABM in model_standard.jl
The docstring of EventQueueABM in movel_event_queue.jl
The example event_rock_paper_scissors.jl file for an example application.

So this PR superseeds #917 by changing a bit the model declaration.

We now have events, that include an action (agent_step!), can be applied to arbitrary agents. Event generation can happen in two ways:

Automatically, by sampling randomly by propensities. This automatic can happen in (a) when an agent is added and (b) when an event finishes its action.
A user may manually add events to the que via add_event! or generate_event_to_queue!.

This is the best of all worlds. It solves all problems. It allows the model to work without the user having to provide a list of initial events. But also allows them to take full control of event scheduling if they want.

Also, it allows going beyond the standard Gillespie simulations by allowing the event times to be arbitrary functions. No longer limited to an exponentially distributed time.

By default however we do have an exponentially distributed time.

Please review by reading the three files I mention at the start! The PR is not finished of course, there is more docmentation to be added and tests. However, I would argue, if we agree with the interface we should merge and we should ask for tests in another PR by opening an issue.

cc @MA-Ramirez

jacobusmmsmit commented 8 months ago

Not a review (yet), but I was reading your branch this morning and I like the proposed changes. Good solution to the "initial events" problem.

Tortar commented 8 months ago

Small review, will take a closer look tomorrow!

Datseris commented 8 months ago

I won't have the time to work on this until next weekend again. If anyone wants to contribute (by e.g., making run! work) go ahead.

This PR has a major problem: its performance is abysmal. Go to the rock example and then do

@btime step!(model)

which will perform a single event and schedule one more event. Well, this takes 1 μs and allocates a million times. That's crazy. We need to do some sort of performance optimization.

It could very well be that the API I have here is simply bad for performance and one needs to think of different ways for users to specify the events... I am not sure.

On the other hand, we could say that "well, this way of modelling is just much slower than the discrete time version."

I am sure the majority of the performance drop comes from type instabilities...

Tortar commented 8 months ago

I think it should be that you added some more type instabilities from v1 because the code in #917 is like 1000x faster or even more

Datseris commented 8 months ago

Hm. Okay. If you can have a look at my code and see if you can bring in your code to increase performance, that would be great! If we can't fix the performance problems, then we go back to your code and add my ideas from here for creating new events on agent addition.

It could be that my decision to allow for arbitrary timing functions rather than exponential sampling is also part of the problem.

Tortar commented 8 months ago

will try to fix it! Hopefully shouldn't be too hard

Tortar commented 8 months ago

I think it is mostly due to how AgentEvent is conceived, it is actually okay to give it to users, but it shouldn't be used internally since it has abstract containers and increases the number of branches in the code. Instead the previous tuple of tuples way should be used which is fast

Datseris commented 8 months ago

I hope that's the solution. I doubt it though because AgentEvent is concretely typed and immutable so it should be equivalent with a tuple. But I hope I am wrong and there is an easy solution.

Tortar commented 8 months ago

AbstractAgent is not concrete, I'm trying to rewrite it, tomorrow/this night we will see :D

Datseris commented 8 months ago

AbstractAgent is not concrete

This doesn't matter. The field types is concretely typed. It contains a Type instance.

Tortar commented 8 months ago

Yes, you are right @Datseris

Profiling says that

function obtain_propensity(event::AgentEvent, agent, model)
    if event.propensity isa Real
        return event.propensity
    else
        p = event.propensity(agent, model)
        return p
    end
end

is the cause, what's the error? :D (I'm not spotting anything at the moment)

edit: is it type stable? seems that returns always a float or maybe I'm wrong

jacobusmmsmit commented 8 months ago

Nooo, all of the type instabilities will be fixed by the time I wake up tomorrow at this rate :/

jacobusmmsmit commented 8 months ago

Is the return type of the propensity function known at call time? Is it the return type of Union{Real,FunctionReturn} causing the instability?

(Note this is pseudocode)

Tortar commented 8 months ago

ahh nothing, this is the problem

function reproduction_propensity(agent, model)
    same = count(a -> a isa typeof(agent), allagents(model))
    return 2*same/nagents(model)
end

this is very expensive, there is no big problem in the actual code.

Tortar commented 8 months ago

Before

julia> @benchmark step!($model)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  421.000 ns …   3.147 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     116.504 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   126.073 μs ± 123.855 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅                     ▄█▇▂                                     
  █▂▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▂▃▇████▆▄▄▄▃▃▃▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂ ▃
  421 ns           Histogram: frequency by time          293 μs <

 Memory estimate: 160 bytes, allocs estimate: 1.

After (with reproduction_propensity(agent, model) = 1/2 * nagents(model)):

julia> @benchmark step!($model)
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  674.300 ns … 211.674 μs  ┊ GC (min … max): 0.00% … 98.00%
 Time  (median):     905.700 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   996.727 ns ±   2.948 μs  ┊ GC (mean ± σ):  4.09% ±  1.39%

           ▁▃▅▆██▇█▇▅▃▃▁                                         
  ▁▁▁▂▂▃▄▅▇██████████████▆▅▅▄▄▄▄▄▅▅▆▄▅▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁ ▄
  674 ns           Histogram: frequency by time         1.47 μs <

 Memory estimate: 808 bytes, allocs estimate: 12.

We can probably make it better, but it isn't actually that allocating or slow in my opinion

edit: profiling I calculated that after this modification still only something like the 30% of all the time is not queue-related, so there is probably some speed-up possible, but doesn't seem that much

jacobusmmsmit commented 8 months ago

It's lame of me to just come in here and stop the merge for such a seemingly small thing but I believe it would be good to change the name of the timing argument as it's part of the public API and is harder to change once fixed.

I propose we use a longer, less ambiguous name. I think interevent_time or interevent_distribution may be better. Currently, the name timing is explained in the docstring every time it's mentioned, someone reading the code may not know exactly what it means when seeing it written down. I think having Agents.jl code easy to read is a top priority.

Tortar commented 8 months ago

It seems we still need to refine some aspects, so it's okay to challenge the api. I coud agree that it can be improved, but I like more the current way than yours :D

jacobusmmsmit commented 8 months ago

In that case let's come back to it when more important things are resolved :)

On Tue, 28 Nov 2023, 17:09 Tortar, @.***> wrote:

It seems we still need to refine some aspects, so it's okay to challenge the api. I coud agree that it can be improved, but I like more the current way that yours :D

— Reply to this email directly, view it on GitHub https://github.com/JuliaDynamics/Agents.jl/pull/940#issuecomment-1830316307, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANTLNXUTZWVBSSGIYPJF6M3YGYLDJAVCNFSM6AAAAAA72DWDACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZQGMYTMMZQG4 . You are receiving this because your review was requested.Message ID: @.***>

Tortar commented 8 months ago

Optimized a bit.

changing also a small assumption, basically now the defaults of AgentEvent.timing is exp_propensity but it means that we add the propensity as an argument, then I changed also the API requiring to pass propensity even for the user, which I think it should be okay.

Still the code should be cleaned up and docstrings updated

edit: the fixed version still allocates less and is faster than the previous one, but only by a 25% or so

Tortar commented 8 months ago

I think there are still some bugs in behaviour on this implementation, but hopefully fixing them shouldn't change the perf improvement

Il Mar 28 Nov 2023, 18:11 Martin Smit @.***> ha scritto:

In that case let's come back to it when more important things are resolved :)

On Tue, 28 Nov 2023, 17:09 Tortar, @.***> wrote:

It seems we still need to refine some aspects, so it's okay to challenge the api. I coud agree that it can be improved, but I like more the current way that yours :D

— Reply to this email directly, view it on GitHub < https://github.com/JuliaDynamics/Agents.jl/pull/940#issuecomment-1830316307>,

or unsubscribe < https://github.com/notifications/unsubscribe-auth/ANTLNXUTZWVBSSGIYPJF6M3YGYLDJAVCNFSM6AAAAAA72DWDACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZQGMYTMMZQG4>

. You are receiving this because your review was requested.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/JuliaDynamics/Agents.jl/pull/940#issuecomment-1830322232, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQH6VX2D26LCAIVMXCNNX5TYGYLNZAVCNFSM6AAAAAA72DWDACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZQGMZDEMRTGI . You are receiving this because your review was requested.Message ID: @.***>

codecov-commenter commented 8 months ago

Codecov Report

Attention: 191 lines in your changes are missing coverage. Please review.

Comparison is base (433faa4) 92.18% compared to head (b865a6e) 85.72%.

Files	Patch %	Lines
src/simulations/collect.jl	42.50%	92 Missing :warning:
src/simulations/step.jl	7.14%	52 Missing :warning:
src/core/model_event_queue.jl	37.70%	38 Missing :warning:
src/core/model_validation.jl	89.36%	5 Missing :warning:
src/submodules/schedulers.jl	50.00%	2 Missing :warning:
src/core/model_abstract.jl	0.00%	1 Missing :warning:
src/core/model_accessing_API.jl	95.83%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #940 +/- ## ========================================== - Coverage 92.18% 85.72% -6.47% ========================================== Files 33 36 +3 Lines 2330 2543 +213 ========================================== + Hits 2148 2180 +32 - Misses 182 363 +181 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Datseris commented 8 months ago

I've been thinking a lot about the performance of this, and in general multi-agent models. What if, for all multi agent models we use metaprogramming to create a single special agent that has all fields that all agents have? And this is used everywhere. For this specially generated agent the function typeof(...) would be overloaded to return a special field .type that contains the type. This is a bit off topic of this PR and in general about addressing multi-agent model performance.

(I am thinking about this because increasing the performance here by 30% is nothing. It would still take microseconds for a single agent step, while it should be taking nanoseconds)

Tortar commented 8 months ago

I don't think It is possible to take nanoseconds for a single agent step, here is why:

Roughly 1/3 of the total time is just about dequeue/enqueue operations
There is also the inherent type instability of the action! which can be different functions at each time

You can see this by using using Profile; @profile step!(model, 10000.0); Profile.print() for the first point and with using Cthulhu; @descend step!(model, 1.0) for the second

So in my opinion this is pretty much good enough (roughly 500.000 events per second on my pc), even if using a single agent type will surely improve a bit the performance

Datseris commented 8 months ago

Yeah okay. In any case I think my suggestion can be also be done by creating a third type of agent container. a "multi-agent" container that will generate this agent type.

Datseris commented 8 months ago

And we can also leave the multi-agent stuff for v7 if it's breaking and it\s not possible to do it non breaking

Tortar commented 8 months ago

using a single agent type will surely improve a bit the performance

anyway I'm interested in seeing how much, I will rapidly test this :-)

Datseris commented 8 months ago

anyway I'm interested in seeing how much, I will rapidly test this :-)

Well in this example it is quite trivial to have only 1 agent, you just add a named field for the type. No need for creating extra code!

Tortar commented 8 months ago

exactly :+1:

Tortar commented 8 months ago

1 type RockPaperScissors

julia> @benchmark step!($model) seconds=1 evals=1 samples=10^6
BenchmarkTools.Trial: 471880 samples with 1 evaluation.
 Range (min … max):  271.000 ns … 2.536 ms  ┊ GC (min … max): 0.00% … 98.72%
 Time  (median):     551.000 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   597.350 ns ± 3.697 μs  ┊ GC (mean ± σ):  0.89% ±  0.14%

     ▁█▁▂     ▄█▁▆▃    ▁▁                                      
  ▁▂▃████▆▃▅▆▅█████▅██▅██▅█▇▄▆▆▃▅▅▃▄▄▂▃▃▂▃▃▂▂▂▂▂▂▁▂▂▁▂▁▁▁▁▁▁▁ ▃
  271 ns          Histogram: frequency by time        1.25 μs <

 Memory estimate: 224 bytes, allocs estimate: 6.

3 types Rock, Paper, Scissors

julia> @benchmark step!($model) seconds=1 evals=1 samples=10^6
BenchmarkTools.Trial: 397554 samples with 1 evaluation.
 Range (min … max):  160.000 ns … 2.875 ms  ┊ GC (min … max): 0.00% … 98.36%
 Time  (median):     852.000 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   925.216 ns ± 9.526 μs  ┊ GC (mean ± σ):  3.92% ±  0.38%

                        ▁▆▃▃▄█▃▃▃█▃▃▂▅   ▁                     
  ▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▃▃▃▄▇██████████████████▇█▆▆▅▆▄▄▄▄▃▃▃▃▃▂▂▂▂▂▂ ▄
  160 ns          Histogram: frequency by time        1.49 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

it helps, but not that much as expected

Datseris commented 8 months ago

It's almost a 2x performance increase, isn't this huge? I would say so.

Tortar commented 8 months ago

I've been thinking a lot about the performance of this, and in general multi-agent models. What if, for all multi agent models we use metaprogramming to create a single special agent that has all fields that all agents have? And this is used everywhere

In general this is bad for memory usage and doesn't allow multiple dispatch on the agent type, so I think this should be a optional feature: I discovered some time ago this package: https://github.com/MasonProtter/SumTypes.jl, which could be the solution. This doesn't require specifying default values which is really cool. But still I don't totally understand how the package can be adapted for Agents.jl :D (actually almost yes, but it needs some work, looking at the code of SumTypes in https://github.com/MasonProtter/SumTypes.jl#performance seems pretty much it actually)

Tortar commented 8 months ago

It's almost a 2x performance increase, isn't this huge? I would say so.

in my gergo it is a cut of 40% of the time :D

Tortar commented 8 months ago

Actually I also understood now that it's harder to replicate the current behaviour with this model type with only one-for-all type:

how do you implement this? We would need to change the source code for this

movement_event = AgentEvent(
    action! = move!, propensity = movement_propensity,
    types = Union{Scissors, Paper}, timing = movement_time
)

removing the types here to make a fair benchmark gets

# 3 types
julia> @time step!(model, 10000.0)
  0.258423 seconds (2.78 M allocations: 166.209 MiB, 12.64% gc time)
# 1 type
julia> @time step!(model, 10000.0)
  0.182622 seconds (2.42 M allocations: 100.065 MiB, 12.93% gc time)

I think that implementing a kind of different behaviour for a one-for-all type is something should be done in a different PR though, because I think we should first find a good implementation, SumTypes.jl should help

Datseris commented 8 months ago

Sure. Let's finish this and merge it in. After all, my plan is to add a clear information in the docs that the event queue is experimental and may change in the future without triggering a new major version. We need users to test it I feel.

Tortar commented 8 months ago

Ok, run! works, but the way it is done is far from perfect, for now it works only with a float when, but it should actually work when both when and when_model are given or with when and when_model vectors, it needs a bigger refactor though

Tortar commented 8 months ago

I don't have the time these days to finish it, if someone has please go on!

Tortar commented 7 months ago

ok, it seems to me that it is hard to make everything work all at once, so I marked this new model type as experimental both in the docstring and when creating an instance of it, I also took care to list some functionality limitations of the current implementation. I think that we should open an issue listing which functions are still not usable with an EventQueueABM and just merge this pr in the current form, because I think this is already quite usable anyway, even if it lacks some of the good features. We can then implement them one by one. Besides, we have already some important visualization changes happening in #934 so we should wait for the changes in there to make the visualization functions compatible with EventQueueABM. What do you think?

Tortar commented 7 months ago

I will open an issue listing functions we still need to implement for this model type soon

JuliaDynamics / Agents.jl

EventQueue ABM v3 #940

Codecov Report