Open davidorme opened 5 months ago
Attention: Patch coverage is 34.28571%
with 23 lines
in your changes are missing coverage. Please review.
Project coverage is 94.00%. Comparing base (
89f8ff1
) to head (1914668
).
Files | Patch % | Lines |
---|---|---|
pyrealm/community_sketch.py | 34.28% | 23 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
- What shape are the dataframes that are used to initialise the Flora and Community classes? I'm concerned that this might make for an API that's quite hard to use as the user will have to know exactly what kind of dataframe to input to use the class.
Yup - I think that's just something we'll have to document really carefully. It's an unavoidably complex part of the setup. We could use alternative formats (TOML, maybe) but I don't think that solve much. The __init__
method should do a ton of validation.
- Have you thought about using something like pydantic https://docs.pydantic.dev/latest/ for input validation?
I've looked at it - can't remember if it was on pyrealm
or a different project. I think at that instant I could only cope with one of mypy
or pydantic
.
- How does time fit into this model, is it calculating the geometry, GPP etc at a given instant?
Really good question. The model basically fits the geometry and canopy for a given instant, given the diameter at breast height (DBH) of each individual in the cohorts/community. The GPP is estimated over a period of time and can then be used to calculate the increase in DBH to grow all the cohorts. At the moment, that is very much a stepped process (using for example GPP over a month), but ultimately it will be more accurate to also allow some kind of an integration scheme to provide continuous growth. We have an similar implementation for that in the Plant-FATE model - but that is down the line.
- Have I understood correctly from the docs that the relationship between community and location is one-to-one? If so, allowing multiple locations within one community class may be confusing.
Yes. That is correct. I also agree with you. The downside is that some of the calculations can be run across communities without difficulty (just larger np.arrays
) which will be more efficient, but then other parts are location specific (total canopy area, environmental conditions). I think by far the easiest place to start is to have multiple locations as a list of Community
objects, each of which contains the cohorts for a single location.
If that proves to be a slow part of the process, we know some ways that we might accelerate it, but at some cost to computational simplicity and readability.
I think if we were to use something like the existing types from virtual_ecosystem, this could potentially reduce the difficulty around documenting inputs (we could provide a json schema in the documentation, and the class structure pretty much does the documentation for you) and make validation a bit easier. If we were to use pydantic with those I think we would get quite a lot of validation and deserialisation for free. I've not used it before, but it seems to get very positive reviews and apparently works well with mypy.
As an experiment, I tried out using the calculate_geometry method and switching to using the classes from virtual_ecosystem and TModelTraits (copied over with an extra field or two) to see how it might play out as an initial suggestion. Personally, I think it improves the readability (disclaimer - I haven't tried running this yet), but I'm not sure what the difference would be performance wise, but I think most of the operations here are fairly cheap.
@dataclass
class PlantFunctionalType:
name: str
a_hd: float
ca_ratio: float
h_max: float
lai: float
par_ext: float
resp_f: float
resp_r: float
resp_s: float
rho_s: float
sla: float
tau_f: float
tau_r: float
yld: float
zeta: float
@dataclass
class Cohort:
dbh: float
pft: str
n: int
cell_id: str
@dataclass
class CalculateGeometryResult:
cohort: Cohort
height: float
crown_area: float
crown_fraction: float
mass_stm: float
mass_fol: float
mass_swd: float
class Community:
def __init__(self, flora: Flora, cohorts: list[Cohort]) -> None:
self.flora = flora
self.cohorts = cohorts
def calculate_geometry(self) -> list[CalculateGeometryResult]:
results = []
for cohort in self.cohorts:
pft = self.flora.get(cohort.pft)
height = pft.h_max * (1 - np.exp(-pft.a_hd) * cohort.dbh / pft.h_max)
# Crown area of tree, Equation (8) of Li ea.
crown_area = ((np.pi * pft.ca_ratio / (4 * pft.a_hd)) * cohort.dbh * height)
# Crown fraction, Equation (11) of Li ea.
crown_fraction = height / (pft.a_hd * cohort.dbh)
#Masses
mass_stem = (np.pi / 8) * (cohort.dbh ** 2) * height * pft.rho_s
mass_fol = crown_area * pft.lai * (1 / pft.sla)
mass_swd = crown_area * pft.rho_s * height * (1 - crown_fraction / 2) / pft.ca_ratio
results.append(
CalculateGeometryResult(cohort, height, crown_area, crown_fraction, mass_stem, mass_fol, mass_swd))
return results
I agree in terms of having multiple community objects, I think the improved readability should take priority in that case.
I think something like that could work really well. The canopy calculation is a property of a whole community, rather than an individual cohort, so I'm wondering if it could just be methods within Community
. On balance, though it is complex enough that having a canopy
module that defines a Canopy
class and then just have the attribute Community.canopy: Canopy
is probably cleaner and better for unit testing.
I'm leaning away from having a Cohort class at all. It's a neater abstraction of the individual rows in the community data but I can't immediately see that having that abstraction gains anything. Almost everything within a community either must be or can be applied using array calculations across all cohorts.
I'm leaning away from having a Cohort class at all. It's a neater abstraction of the individual rows in the community data but I can't immediately see that having that abstraction gains anything. Almost everything within a community either must be or can be applied using array calculations across all cohorts.
I think there's value in the abstraction of a cohort from the point of view of understanding what the input data needs to be like, particularly for someone who's new to the library. Whether or not the cohort class is used, a user will still need to understand conceptually what each row in the community array represents in order to use the library with their own data. I think representing that as a class is useful because it forces that concept to be solidified.
Description
This is a "not even a WIP" pull request to put up and describe a branch containing some sketchy material for the upcoming work. It includes:
pfts.csv
andcommunity.csv
) inpyrealm_build_data.community
A new module
pyrealm.community_sketch.py
that implements skeletal bits of aFlora
andCommunity
class usingpandas.DataFrame
as storage. Except for usingpandas.DataFrame.merge
, all the data could be converted into class attributes asnumpy.arrays
That code is actually usable for a basic operation:
https://pyrealm.readthedocs.io/en/demography_components/users/tmodel/canopy.html
Type of change
Key checklist
pre-commit
checks:$ pre-commit run -a
$ poetry run pytest
Further checks