Performance Impact: Long SArrays vs. High Number of Grids

nicolasmerino41 commented 5 days ago

Hi Raf, Quick question: What do you think is more computationally intensive: using long SArray's (e.g. 1000 elements) or increasing the number of grids in the simulation and keep SArrays at 100-200? I know it will depend on length of the SArray, grid size and grid number, but perhaps there's kind of a rule of thumb. I know using regular Vector might be an option for these lenghts but can't run it in DG. Of course I can build a test and check for all the possible combinations, but perhaps you have a clear take on this.

Thank you, Nico

rafaqz commented 4 days ago

Yeah, 1000s just gets hard on the compiler (and the size of the unrolled function gets so long that it gets local memory problems or something). You may be able to just use nested SVectors. Somehow I think that's better! I think the unrolled function is smaller that way. You can also manually use a loop over the outer level to force it to be smaller

Like length 50 holding length 50.

Multiple grids should be fine too, it just means more complexity

(Also congrats, this size problem is insane you are pushing some boundaries)

Oh also at this scale you want to use the smallest float or Int types if you can. Like UInt8 is 8x smaller than Int. Small float performance is unfortunately not great so Float32 will be best.

Also... for something this intense you want to just look at the native assembly that's generated to know what's happening. You can use the DynamicGrids.descendable function so you can't look - around with Cthulhu.@descend at just one rule being applied once, without having to dive through the whole DG framework.

nicolasmerino41 commented 4 days ago

Nested SVectors sound like a great idea! I'll try it out! Thank you :)

Multiple grids should be fine too, it just means more complexity Sure, but to keep SArray at reasonable sizes it'd mean 10X the number of grids (and they need to be communicating constantly), which I'm sure will cause relevant slowdowns.

Jaja, Idk know if I'm pushing boundaries or I'm just brute forcing into what I want.

In case you're interested :) My model is basically a simulation model of biodiversity (tetrapods) based on semi-empirical data, where 100's of species interact at the population level in 10x10km cells through Generalised Lotka-Volterra equations. And since now I'm scaling up from the Iberian Peninsula to Continental Europe, I have 5X the amount of species.

rafaqz commented 4 days ago

Jaja, Idk know if I'm pushing boundaries or I'm just brute forcing into what I want.

Well you're in the right place for that at least 😂

In case you're interested :)

Very much so! Going for macroecological scales with local population models of thousands of species is very cool.

I'm wondering can you break up Europe into chunks and control species migrations across the boundaries? Like Iberia can be its own thing. Then you don't need all the species to run for all of Europe.

You can do manual sim! with step! and run steps for each area at the same time

(Said having no idea of species distribution patterns here 😂)

nicolasmerino41 commented 4 days ago

I'm wondering can you break up Europe into chunks and control species migrations across the boundaries?

Very legit question! Perhaps not for birds and mammals, but definitely for herps. I used to use four different grids (one for each big taxon group, birds, mammals, reptiles, amphibia) so I could personalise some of their dynamics more easily, but they interact so much across groups that they had to be constantly remerged and split again mid-simulation that it became quite inefficient.

And regarding separating by chunks, I would still have to allow for new species to migrate through chunks (even if it was through sparser events), which would require changing the size of abundance vectors dynamically to accomodate those new species, which sounds quite challenging. I'm just thinking out loud, it's a nice thought process.

I've actually been re-reading your "Working guide to Spatial Mechanistic..." and rethinking the basic way I code my dynamics. I feel the GrowthMaps approach of modelling population dynamics is very elegant, probably hard to scale up to many species though (compared to using semi-static interaction matrices, carrying capacities, things like that) cause it'd require writing too many rules. Idk :)

You can do manual sim! with step! and run steps for each area at the same time

I have never really understood how step! works, how can it be applied "long term"? What's its applicability? I mean, ok, you run one step manually from a given AbstractSimData to the next time step, let's say using a particular rule that perhaps you don't want to always use? How does that relate to your suggestion do manual sim! with step! and run steps for each area at the same time

rafaqz commented 4 days ago

Ah right lots of migration complicated things.

Probably brute force is the most general long term approach. It just means leaning into the compsci aspects of performance optimisation. If you really hit scale problems you can't solve, my PhD ends in 2 weeks and will be available for coauthor level of contributions to make it work.

step! just does one timestep of a simulation manually. So less is taken care of for you, but you can do weird custom things more easily, like run simulations of different areas together with weak dispersal links between them. Like by just manually dispersing between the different regions between each call to step! (I have never personally done this for anything real but it's the idea)

framework.jl has all of that high level code that loops over the timeline, and is quite readable to

nicolasmerino41 commented 4 days ago

Awesome! I'll definitely consider the offer if needed, thxs :) You and Tiem will for sure be in my acknowledgements anyway ;)

Let's see how all this goes. Cheers

cesaraustralia / DynamicGrids.jl

Performance Impact: Long SArrays vs. High Number of Grids #266