Open AayushSabharwal opened 2 years ago
cc @affans owner of the code of paper "5"
https://github.com/affans/covid19abm.jl/blob/master/src/covid19abm.jl#L94-L100
seems to me that a benefit could be had by leveraging the existing data collection infastructure of Agents.jl.
Iirc Bogumił Kamiński has written ABMs without Agents.jl only using pure Julia and DataFrames and such. I'd need to dig up the paper that I have in mind.
As someone who is doing roughly this, may I give my reasoning for doing so:
source
on the documentation and (more often than not) find the exact source code for the function I'm looking for, but many functions are very simple in nature and I care enough about their implementation that I would rather just see the code itself rather than the function name, or at least I would prefer having the code right there in the same file where I can see and edit it.Dict
is not really necessary to store agents, moreover selecting random agents or iterating over agents is as simple as working with contiguous integer indices.In summary, custom ABM code is straightforward, often easier to reason about, and more performant than writing idiomatic Agents.jl code (by which I mean utilising Agents.jl's builtin functions).
While 1. is a moot point in the discussion of good, reusable code, and 2. is somewhat unavoidable due to the very nature of modularity and code reuse, I believe that 3. is the main selling point behind writing custom code. As a scientist/researcher, my results (should) come first, and if getting results means waiting a long time and the alternative is designing and writing custom code then I will choose the latter.
It would be interesting if users could supply constraints which could be use to apply optimisations from which are commonly done manually by rewriting. In my previous example's case, by supplying ABM(...; fixed_agents=true)
would cause the ABM to store agents in an array with their id
s as its indices. Then, through dispatch or other means, further optimisations could be applied.
Yes, see https://github.com/JuliaDynamics/Agents.jl/issues/668 , a version of ABM
with unkillable Agents is in the wishlist. Yes, this will make a performance improvement. It is not clear to me if it will be as major as you elude do, but it will definitely be a measurable improvement.
I would not believe that you have many other cases where the performance of custom code exceeds that of Agents.jl (I'm talking explicitly about cases where the usage of Vector
instead of Dict
has been accounted for). I need proof for that and definitely please open issues if you have such proof. If such a case exists surely it can be incorporated into Agents.jl. Exactly like an ABM version of unkillable agents can be incorporated into Agents.jl simply by using a vector instead of a dictionary to store agents, and propagating any meaningful optimizations in only a couple downstream functions. We just didn't have the time to do it yet. Perhaps you can though, since you have already coded it for your project? Since Agents.jl is modular, everything else would "just work" if you just changed the ABM internals, as all functions use the API to get agents, such as model[id]
.
The reason I am sceptical is because we've spent a ridiculous amount of time optimizing Agents.jl, and as you put it yourself, a researcher cares more about getting results ASAP and less about optimizing code, so I really wouldn't believe that an average scientist's code would be faster. The code of finding nearest neighbors in discrete space for example has gone through 4 complete-rewrite-from-scratch iterations, using versions from graphs, predicting spatiotemporal chaos, all the way to customized iteration datastructures which is current best.
As a scientist/researcher, my results (should) come first, and if getting results means waiting a long time and the alternative is designing and writing custom code then I will choose the latter.
Well, this is offtopic from Agents.jl, but given my involvement with DrWatson and with scientific reproducibility, I would disagree on how clear cut this is. Going "as fast as possible" probably means cutting corners for documenting, testing, and code re-design and re-structure that gives clarity and extendability. Maybe you have done all these things anyways but for the average scientist this is unlikely to be true. If you did all of this developer work nevertheless, you should definitely consider adding it to Agents.jl.
I would argue that using Agents.jl would make your research more reliable and trustworthy, easier to share, and easier to understand for other people. It's a project with wide adoption, massive test suite, and unparalled documentation and this definitely matters for reliability. This is something that could be valued more if you plan to e.g., continue working on future extensions of your project with students or other collaborators, or you would expect someone else to want to replicate your results. I would go as far as arguing that once you collaborate with your future self, you will come to the same conclusion: it will be easier to remember a codebase of 1,000 lines that uses "names of existing functions" than of 10,000 where you can see "the source code of the entire function". Of course this is partly my personal opinion, but certainly one built from experience.
Thank you for the thorough and thoughtful response. I too am interested in how much of a speedup writing custom code can offer. I don't think it would take me too much effort to convert my hybrid Agents-custom code to pure Agents and pure custom so we could see the difference in performance. Once that's done I'll share the repo with you or something to that effect.
Yes, see #668 , a version of
ABM
with unkillable Agents is in the wishlist. Yes, this will make a performance improvement. It is not clear to me if it will be as major as you elude do, but it will definitely be a measurable improvement.
If you allow me to throw a relatively arbitrary piece of example code at you, I might be able to illustrate that the potential improvements will likely not be small:
julia> using BenchmarkTools
julia> v = [1:100_000...];
julia> d = Dict(v .=> v);
julia> function add(a, c::Dict)
for i in keys(c)
a += c[i]
end
end
add (generic function with 1 method)
julia> function add(a, c::Vector)
for i in eachindex(c)
a += c[i]
end
end
add (generic function with 2 methods)
This is fundamentally the same approach as it either uses the existing keys of a Dict
or the indices of a Vector
to access the stored data.
julia> @benchmark add(0, $d)
BenchmarkTools.Trial: 3026 samples with 1 evaluation.
Range (min … max): 1.599 ms … 3.453 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.630 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.650 ms ± 82.919 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▆█▃
▃▇███▇▅▄▆▇▆▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂▂▂▁▁▁▂▂▂▁▁▂▁▂▂ ▃
1.6 ms Histogram: frequency by time 1.99 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark add(0, $v)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 2.336 ns … 32.020 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 2.947 ns ┊ GC (median): 0.00%
Time (mean ± σ): 3.173 ns ± 0.966 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▆ █ █ ▄ ▄ ▅
▄▆▁▆▂█▄█▄█▂█▂█▂▇▂▂█▂▃█▃▃▃▃▃█▃▂█▂▂▂▅▂▁▂▅▁▂▁▅▁▁▁▄▇▁▁▁▄▇▁▁▁▁█ ▃
2.34 ns Histogram: frequency by time 4.57 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
The improvements are real and as @Datseris already noted, we have that on the wishlist with #668. PRs for this are more than welcome! :)
Roughly one and a half years ago we also thought about switching away from the Dict
for agent storage (basically an array of structs) and use StructArray
s instead (see #396 for the issue and #492 for the related PR). We scratched that idea back then because the trade-offs didn't seem worth it to us. So as you can see, we are very much open to further improve the performance of Agents.jl.
Fundamentally I agree with both of you. For some applications it's certainly easier and even faster to write a custom ABM. But as I've also argued over on Discourse and in line with what @Datseris said above, using an existing framework is not only about performance but also about reproducibility, maintainability, ease of use, readability, shareability, and so on and so forth.
More generally speaking, I believe that science should not focus too much on speed but rather on accuracy and credibility. Eliminating some potential sources of errors in advance by leveraging the power of an open-source modular framework just makes sense in this regard.
If you allow me to throw a relatively arbitrary piece of example code at you, I might be able to illustrate that the potential improvements will likely not be small:
Heh, be careful here. This example doesn't really help get a realistic intuition on a realistic speedup in a realistic ABM:
julia> function addret(a, c::Vector)
for i in eachindex(c)
a += c[i]
end
return a
end
addret (generic function with 1 method)
julia> @btime addret(0, $v) 6.120 μs (0 allocations: 0 bytes) 5000050000
julia> @btime add(0, $v) 1.600 ns (0 allocations: 0 bytes)
That's why we really need a realistic benchmark between a full implementation of either system.
As promised, here's a (admittedly rushed) example of an ABM in this repo. Each of the ABM codes build upon some core code which I've simplified and tried to explain the motivation and context for. All in all the core + ABM implementations come to ~200 LoC.
The hybrid
implementation is what I started with when I commented on this issue, the agents
implementation tries to use as many library functions as possible (map_agent_groups
, random_agent
), the hybrid
one replaces some with what I found to be more efficient (although that's actually debatable looking at the benchmarks), and the custom
implementation is my rushed attempt at stripping away all of the library code. In particular, it makes the following adjustments:
So yeah... I guess the conclusion here is that @Datseris is right about the speedup not being so substantial. That said, I must reiterate that the changes made between the hybrid and pure Agents.jl code and the hybrid and custom code were written quickly without much optimization thought.
Notice that a cut in 33% of compute time is a substantial change and definitely warrants an implementation of an unkillable ABM! It isn't as massive as one would dream of of course, like 10x speedup or something :P
I'd be more than happy to contribute to making this a reality (especially given how often I'll end up using it!). Could you come up with a checklist of desired functionality, (new) tests that should be passed, and aspirational syntax so I can focus my efforts and implement only minimal changes?
Questions that stick out to me are:
ABM(MyAgent; kwargs)
where one of the kwargs
is indestructible_agents = true
? Hi! I'm glad to see this coming to fruition, haven't had a lot of time to do it myself 😅
UnkillableABM
or the like. It and AgentBasedModel
would need to be subtypes of AbstractABM
, and implement the required functions as indicated in #668 kill_agent!
and genocide!
. So I think it makes sense for it to be called unkillablekill_agent!
and related functions. Possibly scheduling.api.md
, likely in their own sectionEDIT: Ah, I see there's a PR open that I missed.
It doesnt' need to be a separate struct in the end. We can cheat the system as in my comment here: https://github.com/JuliaDynamics/Agents.jl/pull/721#issuecomment-1364486767 .
https://medium.com/@raj.dandekar8/julia-for-epidemiology-247cc04fc297
This is a list of papers using Julia for epidemiological studies. Of the studies listed here, numbers 2, 4 and 5 use agent-based models, but choose to implement them manually instead of using Agents.jl. It might be worth looking through the code (they're all forked off of the same code from paper 5) and seeing why this is the case. A quick glance says it's highly unlikely to be due to missing features, and considering they're using
nothing
space I doubt it's a performance issue, but it'd be nice to get some perspective.