TaranRallings commented 1 month ago

Description

EDIT - I have put this up in addition to a multi-grid + refactor draft PR at #538 . Before we do a deep review of the actual multi-grid content, I would appreciate folks taking a look at the refactor format. I am trying to decide whether to stick with this current version, built around the model, community, cohort, and territory classes, or whether to shift to the refactor that abandons the AnimalCommunity class and moves its content into AnimalModel and AnimalCohort.

This is code changes to the animal model that allow a cohort to simultaneously occupy more than one grid cell. This is done through an AnimalTerritory class that is attached to a given cohort and stores the information about the central grid of the territory, the full set of grids in the territory, and what plants and pools exist within it.

Changes build on the Madingley assumption of a cohort being evenly distributed within a grid cell. So a cohort is now evenly distributed within its territory. This creates some hiccups in some equations and I have made an issue to tackle this soon.

AnimalCommunity now stores animal cohort information in two ways. 1) which cohorts have their territory centroid in the community (being both large and single grid territories) 2) which cohorts have a partial presence in the territory and what proportion of the total cohort is there. (occupancy)

Running the AnimalModel cycles through communities, as before, but now each community cycles through those cohorts having their centroid in the community, not merely occupancy. Occupancy is used for foraging and excretion.

Basically any method that involved space, foraging, or excretion had to be changed to run off territories instead of a single community. I have been staring at it for too long to see errors now so I expect some problems snuck in.

I did some inelegant hacky stuff to make the circular import issue of community -> cohort -> territory -> community work. Please let me know if the current Frankenstein's monster of protocols and delayed imports is unworkable or if you have a cleaner solution.

Apologies for the size of this one!

Fixes # (issue)

Type of change

[x] New feature (non-breaking change which adds functionality)
[ ] Optimization (back-end change that speeds up the code)
[ ] Bug fix (non-breaking change which fixes an issue)

Key checklist

[x] Make sure you've run the pre-commit checks: $ pre-commit run -a
[x] All tests pass: $ poetry run pytest

Further checks

[x] Code is commented, particularly in hard-to-understand areas
[x] Tests added that prove fix is effective or that feature works
[x] Relevant documentation reviewed and updated

codecov-commenter commented 1 month ago

Codecov Report

Attention: Patch coverage is 85.54688% with 37 lines in your changes missing coverage. Please review.

Project coverage is 94.42%. Comparing base (72ef3de) to head (b979702).

Files	Patch %	Lines
...tual_ecosystem/models/animal/animal_communities.py	75.00%	21 Missing :warning:
virtual_ecosystem/models/animal/protocols.py	74.28%	9 Missing :warning:
...rtual_ecosystem/models/animal/scaling_functions.py	66.66%	5 Missing :warning:
virtual_ecosystem/models/animal/animal_cohorts.py	95.12%	2 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## develop #530 +/- ## =========================================== - Coverage 94.96% 94.42% -0.54% =========================================== Files 73 74 +1 Lines 4071 4287 +216 =========================================== + Hits 3866 4048 +182 - Misses 205 239 +34 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dalonsoa commented 1 month ago

I was about to review this, but found out about #538 about a similar topic. Which should a review first? Or should I wait?

TaranRallings commented 1 month ago

I was about to review this, but found out about #538 about a similar topic. Which should a review first? Or should I wait?

Just added an edit to this to clarify!

Before I get a proper review on the multi-grid functionality, I need some help deciding which format would be better going forward. If you could give me some comparative structural feedback on the #538 first, that would be great.

davidorme commented 2 weeks ago

@dalonsoa I had a long chat with @TaranRallings on Teams about this implementation before I went on leave and I think that context is missing here, so I'm going to duplicate my stream of consciousness here to capture the rationale for the parallel multi grid PRs:

The heart of the issue is that we have a stack of cohorts (which can now live in multiple cells) and we have to go through them all in some order and find out:

where they shit (into the ExcrementPool of each cell in their territory)

what primary productivity they eat (from the PlantResources of each cell in their territory)

where they die (into the CarcassPool of each cell in their territory) and

what prey animals they intersect with and (can therefore kill and strew bodies around the CarcassPools of their intersecting territories).

Does the order of resolution of the communities [ETA - ie. the specific sets of cohorts found in one cell] actually matter? We want to run each cohort, but is it actually desirable to run cohorts in groups by community [ETA - ie cell], or would it be equally good to just run all cohorts randomly regardless of community? Of course, we could define a cohort ordering that reflects their shared co-occurrence?

I think where I'm going with this is that the Community was a natural thing in the single grid occupancy model. The cohorts only occur in a single cell and can only eat things that also occur in that cell, die in that cell and crap in that cell. So - group them into a community.

But we've now got something fundamentally different with multi-grid occupancy and the Community feels a bit like a strait-jacket. The resource and decay pools are all nested inside them and we're pinning cohorts to a more or less arbitrary 'home' Community.

Is an alternative is to have lists of the pools that are 1-to-1 with cells (Excrement, Carcass, Plants) [ETA - as properties of the AnimalModel] and then just have a list of animal cohorts with territories - the animal cohorts are the only thing that can span cells. We'd end up with something a little like this very dumb model:

import random
from dataclasses import dataclass, field
from uuid import uuid4, UUID

import numpy as np

@dataclass
class Cohort:
    size: float
    herbivore: bool
    territory: set[int]
    n: int
    uuid: UUID = field(init=False)

    def __post_init__(self):
        self.uuid = uuid4()

@dataclass
class Plants:
    leaves: int = 10

@dataclass
class Corpses:
    n_corpses: int = 0

@dataclass
class Crap:
    amount: int = 0

class Dumb:
    def __init__(self, n_cells=9):
        self.cell_ids = np.arange(n_cells)
        self.plants = [Plants() for idx in self.cell_ids]
        self.crap = [Crap() for idx in self.cell_ids]
        self.corpses = [Corpses() for idx in self.cell_ids]

        self.cohorts = [
            Cohort(size=10, herbivore=False, territory={0, 1, 2, 3, 4, 5}, n=2),
            Cohort(size=5, herbivore=False, territory={3, 4, 5, 6, 7}, n=20),
            Cohort(size=1, herbivore=True, territory={1, 2, 3, 4, 5, 6, 7}, n=200),
        ]

    def step(self):
        # Loop over cohorts
        for focal_cohort in self.cohorts:
            # Do herbivory
            if focal_cohort.herbivore:
                for cell in focal_cohort.territory:
                    self.plants[cell].leaves -= 1

            # Do carnivory - each individual eats 1 of each smaller cohort
            if not focal_cohort.herbivore:
                for target_cohort in self.cohorts:
                    if (focal_cohort.uuid != target_cohort.uuid) and (
                        focal_cohort.size > target_cohort.size
                    ):
                        # Do these cohorts overlap in space?
                        overlap = focal_cohort.territory.intersection(
                            target_cohort.territory
                        )

                        if overlap:
                            target_cohort.n -= focal_cohort.n
                            for corpses in np.arange(focal_cohort.n):
                                corpse_cell = random.sample(list(overlap), 1)[0]
                                self.corpses[corpse_cell].n_corpses += 1

            # Do crapping
            total_crap_per_cell = (focal_cohort.size * 0.1 * focal_cohort.n) / len(
                focal_cohort.territory
            )
            for cell_id in focal_cohort.territory:
                self.crap[cell_id].amount += total_crap_per_cell

dumb = Dumb()
dumb.step()
dumb.step()
dumb.step()
dumb.step()

print(dumb.corpses)
print(dumb.plants)
print(dumb.crap)
print(dumb.cohorts)

Those classes seem less encumbered and circular - but there may be something critical we lose here.

I'm going to have to avoid thinking about the cohort overlap question. Too many interesting avenues. I did just want to dump another approach though. The set intersection solution above probably isn't too slow, but it is going to be used a simply huge number of times. The problem is that the number of possible intersection checks scales as the square of the number of cohorts, which isn't ideal.

I'm vaguely wondering if it might be worth retaining the communities [ETA - ie the community structure] as per cell id lists of cohort IDs? That might be faster for getting to overlapping cohorts - although there is then the cost of maintaining that index as cohorts move. That might be worth it though.

import random
from dataclasses import dataclass, field
from uuid import uuid4, UUID

import numpy as np

@dataclass
class Cohort:
    size: float
    herbivore: bool
    territory: set[int]
    n: int
    uuid: UUID = field(init=False)

    def __post_init__(self):
        self.uuid = uuid4()

@dataclass
class Plants:
    leaves: int = 10

@dataclass
class Corpses:
    n_corpses: int = 0

@dataclass
class Crap:
    amount: int = 0

class Dumb:
    def __init__(self, n_cells=9):
        self.cell_ids = np.arange(n_cells)
        self.plants = [Plants() for idx in self.cell_ids]
        self.crap = [Crap() for idx in self.cell_ids]
        self.corpses = [Corpses() for idx in self.cell_ids]

        self.cohorts: dict[UUID, Cohort] = {}

        self.communities = [set() for idx in self.cell_ids]

    def add_cohort(self, cohort: Cohort) -> None:
        # add the cohort and register the cohort in the communities for its territory
        self.cohorts[cohort.uuid] = cohort
        for cell_id in cohort.territory:
            self.communities[cell_id].add(cohort.uuid)

    def step(self):
        # Loop over cohorts
        for f_uuid, focal_cohort in self.cohorts.items():
            # Do herbivory
            if focal_cohort.herbivore:
                for cell in focal_cohort.territory:
                    self.plants[cell].leaves -= 1

            # Do carnivory - each individual eats 1 of each smaller cohort
            if not focal_cohort.herbivore:
                # Find communities overlapping focal cohort
                cooccur = [self.communities[idx] for idx in focal_cohort.territory]
                target_uuids = set([uuid for cell in cooccur for uuid in cell])

                for t_uuid in target_uuids:
                    target_cohort = self.cohorts[t_uuid]
                    if (focal_cohort.uuid != target_cohort.uuid) and (
                        focal_cohort.size > target_cohort.size
                    ):
                        # Get the overlap - it is still probably faster to do this to
                        # get the intersection now that they are known to overlap?
                        overlap = focal_cohort.territory.intersection(
                            target_cohort.territory
                        )

                        target_cohort.n -= focal_cohort.n
                        for corpses in np.arange(focal_cohort.n):
                            corpse_cell = random.sample(list(overlap), 1)[0]
                            self.corpses[corpse_cell].n_corpses += 1

            # Do crapping
            total_crap_per_cell = (focal_cohort.size * 0.1 * focal_cohort.n) / len(
                focal_cohort.territory
            )
            for cell_id in focal_cohort.territory:
                self.crap[cell_id].amount += total_crap_per_cell

dumb = Dumb()

dumb.add_cohort(Cohort(size=10, herbivore=False, territory={0, 1, 2, 3, 4, 5}, n=2))
dumb.add_cohort(Cohort(size=5, herbivore=False, territory={3, 4, 5, 6, 7}, n=20))
dumb.add_cohort(Cohort(size=1, herbivore=True, territory={1, 2, 3, 4, 5, 6, 7}, n=200))

dumb.step()
dumb.step()
dumb.step()
dumb.step()

print(dumb.corpses)
print(dumb.plants)
print(dumb.crap)
print(dumb.cohorts)

I think the technical aspect of how best to do that overlap is one for the RSE team 😄

davidorme commented 2 weeks ago

@dalonsoa If we decouple AnimalCohorts from specific cells (which we need to do to allow large animals to occupy many cells) then we end up needing to coordinate indexing from $N$ grid cell ids to $M$ animal cohorts. Although $N$ is smallish and fixed, $M$ varies (as cohorts are born and die) and could potentially get quite large.

Finding the cell specific pools for plants, carcasses and excrement is just indexing across cell ids from the cohort territory (set of cell ids where the cohort exists) but for carnivory we want to be able to quickly extract which cohorts overlap each other. The ranges may be sufficiently dynamic that it this is unavoidably a per step operation, but we'd like to optimise how best to find that. Although an $N \times M$ matrix (probably sparse because many cohorts will have small territories) looks attractive, it will vary in shape very frequently and might be rather large, but maybe if that gets created once per time step then it can form the basis of the indexing for that time step? Unless of course ranges change during the process rather than at the end! Not sure what is optimal here.

dalonsoa commented 2 weeks ago

It's an interesting problem :)

I think the second approach is neater, but I'm a bit worried on the complication of keeping the communities list updated when there are tons of cohorts moving around a large number of cells. Not being for that, I would choose that one.

The first approach, on the other hand, is simpler and possibly slower if things don't move, but you can see in the code that it is not scaling as the square of the number of cohorts: you have a first filter based on the cohort size, for example, so the intersection will need to be checked way less than the square of cohorts.

I guess that in a real scenario the filters to decide what other things get eaten by a cohort will be different, but if they are applied in the right order, you might cut the search space pretty fast, only applying the heaviest filters to a handful of cohort pairs. Combined with efficient tools like list comprehensions, itertools, functools, etc., I don't think this should be a barrier.

So, something like the following, for this dummy case, could be pretty efficient:

from functools import reduce

def is_different(one, two):
    return one.uuid != two.uuid

def is_bigger(one, two):
    return one.size > two.size

def overlap(one, two):
    return len(focal_cohort.territory.intersection(target_cohort.territory)) != 0

def apply_filter(focal, cohorts, filter):
    return [target for target in cohorts if filter(focal, target)]

filters = [is_different, is_bigger, overlap]

for focus in all_cohorts:
    food = reduce(lambda cohorts, filter: apply_filter(focal, cohorts, filter), filters, all_cohorts)
    crap, corpses = focus.eat(food)

Or something along these lines. I'm sure you get the idea.

dalonsoa commented 2 weeks ago

Some filters like if a cohort if within the "menu" of the focus cohort (eg. "insects" for a cohort of a species of birds), will be really efficient to put a the beginning of the list and cutting it down to a minimum the range of options.

Also, in the example above, probably we can skip the is_different filter because the is_bigger filter is already getting rid of the focus cohort itself from the list.

Obviously, applying filters this way can be applied equally to the second example where you keep track of the territory overlap, but I don't think it is worth the effort to keep that filter - because that is all it is, in practice - as a special case, unless that information needs to be kept for other reasons.

davidorme commented 2 weeks ago

I think that filter approach looks really promising. I could see the functional types being defined with a set of named prey filters and then having a simple registry of filter functions that users could expand to add new prey selection modes. We'd likely need some and/or logic that we could apply to filters or possibly allow sets of filters, but I can see that process working well.

The overlap filter is a special case because I think we would always want that to return the contents of focal_cohort.territory.intersection(target_cohort.territory). But equally, we are only ever going to want to select prey that overlaps, so that might be a separate step after the previous filtering. @TaranRallings ?

TaranRallings commented 2 weeks ago

There is quite a bit going on here so let me know if I have misunderstood something.

we think ditching the community class is a good idea
we are worried about indexing issues (there is a reason foraging ate up all the time in Madingley)
@dalonsoa 's filtering set up is probably a good flexible structure for handling prey selection going forward

Unordered Notes:

foraging cannot be collapsed into 1 prey item at a time without reworking the foraging math (delta_mass_predation) as it is built on cumulative prey densities.
carcasses are handled at the intersection of territories because that is where prey fall happens
a full system community matrix would need to be regenerated at multiple points within a timestep as it changes due to: local migration, immigration/emigration, birth, death, and metamorphosis all of which are handled separately.
prey selection is a toy version of body-size relationships and I won't be adding significant detail to it until later in development.
set (and other oddities) are in #538 because I was trying to get something basically comprehensible out so we could make a decision asap and I wouldn't lose too much time on the refactor. In the case of set, I wanted it for intersection() and then decided to roll with it for related structures.

If we agree on the no-community version then I will move forward in development there.

davidorme commented 2 weeks ago

@TaranRallings That's all clear and I think that "no community object" implementation in #538 is the one to pursue (unless @dalonsoa spots something!).

I hadn't spotted that cumulative density issue in delta_mass_predation. At the moment we have a list of target cohorts in delta_mass_predation and then - for each one - we dive down through calculate_consumed_mass_predation, F_i_j_individual and then theta_i_j(animal_list).

So that theta_i_j is calculated repeatedly, but is that mandatory because predation of subsequent cohorts needs to account for the reduced density of already predated targets? I was wondering if we could calculate theta_i_j once across the list of targets, but I guess that is then implicitly a model where predation is parallel rather than sequential.

TaranRallings commented 1 week ago

@davidorme ok great. I need to charge ahead on the scheduled work so I'll get to that. I'll make an issue to take a look at other indexing possibilities after the animal model is fully up and running.

ImperialCollegeLondon / virtual_ecosystem

Animal - Feature/multi grid occupancy #530

Description

Type of change

Key checklist

Further checks

Codecov Report