EcoJulia / EcoSISTEM.jl

Julia package for ecosystem simulation
Other
41 stars 8 forks source link

[feature/enhancement] abstract measurement types #62

Open gottacatchenall opened 3 years ago

gottacatchenall commented 3 years ago

Hi all,

As I start to migrate the list of items in #13 to individual issues, I've found it useful to better define the functionality for VEL.jl to better plan how the interface between the two will work. This issue mostly relates to my plan to implement different types of Measurements

As far as I can tell, all this would require from EcoSISTEM to integrate is shifting how storage works storage in simulate_record!(storage::AbstractArray...).

Right now this is filled via,storage[:, :, i] = eco.abundances.matrix

Would it be possible to shift the abundance matrix type in Ecosystem to something that stores an arbitrarily long list of matrices (parameterized by data type) corresponding to different measurements (e.g. abundance and a trait unrelated to the traits being selected on in the EcoSISTEM trait-relationship)?

richardreeve commented 3 years ago

Generalising how we store data in an Ecosystem is definitely a good idea - at the moment (@claireh93 will correct me on the details!) I think Ecosystems have two arrays, one for the (many) species under study, and one for any pathogens (can be nothing), but this is really just a stopgap while we work out how we need to generalise it properly like you suggest.

However, it's not as simple as you hope. In particular, we still haven't fixed the parallelisation in the move to transitions. Fundamentally, we can't store things as straight matrices once we have more than one process, because we end up with too many scatter/gather commands and it gets very inefficient. From that point of view this is currently blocked by #63.

It's also not as simple as you might hope because we need to work out how to reconcile this with EcoBase.jl (and Diversity.jl), which are currently predicated on the idea of one set of Things and one set of Places. I don't think we currently have a problem with the idea of only one set of places (though I can imagine a situation where we did, for instance nested places within places), but it sounds like you may be thinking about multiple types of thing? Is that right, or are you thinking of something else?

Talking of simulate_record!(), we need to generalise how we record stuff too, but that may be another issue.

gottacatchenall commented 3 years ago

Although I'm not entirely familiar with how parallelization is implemented currently, re the way I'm implementing mechanisms I've found that transitions tend to fall into the categories of 1) independent across location, 2) independent across category (e.g. the categories for SIR-type models, or species in biodiversity) and 3) independent across both location and category.

Implementing parallelization of transitions across any given axis of the current state could possibly solve this?

gottacatchenall commented 3 years ago

It's also not as simple as you might hope because we need to work out how to reconcile this with EcoBase.jl (and Diversity.jl), which are currently predicated on the idea of one set of Things and one set of Places.

perhaps i misunderstand what you mean, but i think bundling each individual Thing with a Place makes sense, and so each set of measurements within an Ecosystem could correspond with its own set of Places

richardreeve commented 3 years ago

i agree about the categorisation of the transitions (if I understand you correctly) - we have a similar breakdown, but that imposes some constraints on how we store the data if we want to parallelise efficiently. I'm not sure what you mean about the bundling though if you need multiple matrices for storage... it may be that I'm not understanding what your measurements are - some seem to be species traits, and some location (e.g. abiotic) data, both of which we handle separately from species-in-location data, like abundance / biomass / occupancy. All of these have to be handled differently depending on how they are used in the code because only some information needs to be present on some processes. Unfortunately to manage memory efficiently and minimise inter-process communication for large simulations we have to care about when and where different information is used, and it's not currently clear to me how that aligns with how you're thinking about things.

I suspect that there's a reasonable chance we could have a discussion at cross purposes for quite a while about any of these issues - it's not necessarily simple to resolve how to integrate two similar but unrelated frameworks. It might be useful to have a face-to-face chat about this at some point soon when we're all free? It may in particular help us to break this down into smaller pieces that we can implement more easily.

claireh93 commented 3 years ago

Hi both, I think a discussion sounds good - it's always tricky to discuss these things via github issues! Shall we arrange something for in a few weeks time?

gottacatchenall commented 3 years ago

yeah sure, i'm working to get VEL.jl further along so it becomes more clear what interfaces are required

richardreeve commented 3 years ago

I'm pretty booked out next week and unavailable the week after, but w/c 24th May could be possible or the following week if someone wants to set up a doodle poll?

gottacatchenall commented 3 years ago

i can send a doodle poll out for the week of the 24th soon. i'm currently reviewing the DynamicGrids.jl and Dispersal.jl paper which has at least partial solutions to the problems posed in this thread with high parallelization efficiency.

might be worth having a discussion with them as well

richardreeve commented 3 years ago

Great. I'm pretty much booked out till the end of May now, but free after that still.