Seed dispersal in FATES

evalieungh commented 5 years ago

Is it possible to add seed dispersal in FATES?

At the CTSM 2019 tutorial, @rosiealice @jkshuman @rgknox @ekluzek, me and others started discussing options for integrating dispersal mechanisms in FATES. What is the best way to represent dispersal in FATES in accordance with ecological theory? What options are there for implementing this in FATES and/or CTSM/CESM?

Part of my PhD project is to look into modelling dispersal of ecosystems, and I'm hoping to use FATES to represent ecosystems somehow and maybe try to implement dispersal. @huitang-earth

As it stands, PFT dispersal is assumed to be perfect, i.e. when the environmental conditions and an opening exist, the PFT will grow anywhere.

rosiealice commented 5 years ago

@ekluzek sent me a great assessment of the software considerations of this over the weekend.

"At yesterday's tutorial, you asked me about MPI and a dispersal model for CESM/CTSM that would disperse various things like fire or contagions from one grid cell to the next. Because of how CTSM is setup this isn't an easy thing to do. But, it also sounds like this is a scientific direction that's important for a variety of things. So having a general solution for this sounds like something important to think about and start planning for. So I'm going to list some of the most important things about this.

So here are my bullet points...

The goals with MPI are to: reduce the total communication, and divide up the work
The decomposition in CTSM is the worst possible way to do this
We need to decide if this is really a land process or an atmospheric one?
The only way that makes sense is to move from the CTSM decomposition to a decomposition used by the dispersal model and back and forth

Here's more details on those points:

The goals with MPI are to: reduce the total communication, and divide up the work

CTSM runs fast with MPI because we don't have to communicate between grid cells. So the communication it requires from that perspective is zero. And there is plenty of work to divide up because you can divide it up down to individual gridcells. The main problem we have with CTSM is that the actual work isn't perfectly divided up. Anyway, the point is that CTSM is great for MPI because communication is zero and you can divide the work quite well.

The decomposition in CTSM is the worst possible way to do this

Technically you could add extra infrastructure to CTSM to do MPI communication from one cell to the surrounding grid cells. But, the CTSM decomposition is setup to randomize grid cells on processors. We intentionally try to put grid cells on the opposite side of the globe for example. This means that almost every grid cell is going to be communicating with other processors regarding its neighbor cells. So this solution maximizes required communication in order to divide the work.

We need to decide if this is really a land process or an atmospheric one?

CAM for example, can advect tracers of various sorts through the atmosphere. And it does so with 3D fluid flow, so it can do the job right. So if the dispersal needs to take into account wind we need to think if this isn't an atmosphere process. Now, to run with CTSM standalone, that might mean we'd need to modify DATM to handle simpler surface dispersal.

I think the important point here is to decide what the list of requirements are for this. This could also be coupled directly into CESM as another "component", but obviously that's a bigger conversation.

The only way that makes sense is to move from the CTSM decomposition to a decomposition used by the dispersal model and back and forth

If we decide it's part of the land model, I think the only thing that makes sense is to do MPI communication from the CTSM decomposition to a simpler 2D grid where the decomposition is made up of squares. Now, a part of this is that the communication cost is only worth it, if you get a gain from dividing the work. If the work is smaller you might not want to divide it up into the same number of processors for CTSM, but a smaller subset. But, adding infrastructure to do this allows you to try different things.

Scatter/gather to one processor could be done now..

So for example you could gather all CTSM grid cells to a single processor and run the dispersal code on it. If the work is small but communication high it might be faster to do that than to spread it up into more than one processor. And actually we do have the infrastructure in CTSM to do a gather and scatter to one processor already. So this could be implemented sooner. A problem with it is that then memory won't scale with processor count as it currently does. But, if it's only done for a few fields that's not too big of a problem."

rosiealice commented 5 years ago

...and then @billsacks replied

"I worked with @slevisconsulting a couple of years ago on something similar: beetle dispersal. I sketched out a general algorithm that I think amounted to one of your last points in this email – basically, doing a global gather to the master proc – and Sam implemented this. Probably not ideal – particularly in terms of performance – but it got the job done, and something like that could be reused at least for initial prototyping and scientific development."

rgknox commented 5 years ago

Thanks for creating this thread @evaleriksen . And @ekluzek, you put a lot of good thought into this, thanks!

To re-iterate @ekluzek 's point 2. The current decomposition seems fairly random, so if we wanted to benefit from node-to-node communications, we would gain more if we changed the grid decomposition to something with spatial structure.

In response, a point about FATES and ED-like models. Since FATES uses dynamic allocation of cohorts, regions with lots of biodiversity and multilayered canopies (tropical forests) would potentially have many more (orders of magnitude) cohorts than places like deserts. In ED2 we made our domain decomposition scheme balance according to these expected cohort loads, which I'm guessing, giving the frequency of communication needed for seed dispersal, may be more important for efficient runs.

However, in a coupled simulation, wouldn't the land-grid decomposition be tiled, to more efficiently communicate with the atmosphere (which is tiled right?)

rosiealice commented 5 years ago

Eunjee Lee did implement something along these lines, I think during her PhD. I have a recollection that she did all the simulations on a single processor. There might be some useful stuff to build off scientifically here... https://dspace.mit.edu/handle/1721.1/69469

ekluzek commented 5 years ago

In the LMWG meeting today, Marje Prank talked about "Modeling the impacts of climate and land use change on the emission and transport of rust spores". From her plots she obviously hooked up the transport of the rust spores to the atmosphere model. Asking her what she did could be useful. She also noted that in a few days the spores could transport across oceans. So the smaller the particle the more important being properly hooked into the atmospheric flow will be. If transport across oceans is important doing that may be a requirement.

slevis-lmwg commented 5 years ago

I don't think my beetle work made it into a branch, so you would need to talk to Jeff Hicke (U of Idaho) who owns the beetle model if you decided to go that route.

...and then @billsacks replied

"I worked with @slevisconsulting a couple of years ago on something similar: beetle dispersal. I sketched out a general algorithm that I think amounted to one of your last points in this email – basically, doing a global gather to the master proc – and Sam implemented this. Probably not ideal – particularly in terms of performance – but it got the job done, and something like that could be reused at least for initial prototyping and scientific development."

evalieungh commented 5 years ago

Thank you for all the useful input to this discussion. I have thought about this a bit more, and although I'm not sure I understand the technical bits I have some thoughts on what we need to think about when choosing a solution, and sketched a rough first idea. Please take the suggestions with a grain of salt, but the principles of mechanistic solutions I think is important. Here goes, a list of things to think about before going forward:

Purpose of modeling: I assume most people who work with CTSM want to improve the quality of future climate predictions. With that in mind, the most important role of seed dispersal is indirect - to modify which PFTs grow where, affecting the properties of the land surface. Another purpose is to study vegetation dynamics in itself, with a goal of process understanding rather than more accurate climate predictions. In the long run, I think these two purposes coincide, but in the shorter term there is a possible conflict here. To improve climate predictions, the 'best' solution is the one that makes the model fit better to observations of climatic variables without being too expensive to run. To understand vegetation dynamics, the best model must be in line with ecological theory and physical mechanisms, perhaps even if that introduces artifacts that decrease the fit to observations (i.e., it's more interesting to study how/why the model fails than to make a 'good' model for the 'wrong' reasons). (My project definitely falls into the latter category of purposes, although I hope introducing dispersal will improve the predictions...)
Scale of the dispersal process: Most seeds fall close to the mother plant. The few seeds/propagules that travel far (by wind, birds, tourists etc...) are very important and can have a major impact on vegetation by invading new areas, but I think we might want to separate rare events from the dominating, common events somehow. The rare long-distance dispersal events are important for long time scales, whereas common short/medium-distance dispersal events along with environmental factors determine the movement of species and ecosystems across the landscape (e.g. treelines up the mountain sides). Importantly, dispersal limits the movement of plants across the landscape.
Gridcells and spatial scales: CTSM/CESM globally is run with big gridcells, whereas FATES could be run for much smaller areas. In the future, maybe even CESM can be run at much finer scales, or even with polygon-shaped 'gridcells'. I'm not sure if it is most important to include dispersal within or between gridcells, or both, and whether that should be done the same way or differently. Given that the grid cells are currently really big, and we don't know where in the grid cell a certain cohort is, we don't know when it is meaningful for seeds/propagules to cross gridcell boundaries. Maybe, with the current spatial resolution of the models, it is best to think of rare long-distance events across gridcells and common short/medium-distance events within gridcells. For bewteen-gridcell dispersal, it might not be important which gridcells are geographically close to the 'mother' gridcell - broad patterns of human travel, bird migration and wind systems might be more important than distance from the source plant...
Time steps: It varies a lot between species when, how often and how many seeds are produced, and what is required for the seeds to grow. Could some sort of PFT-specific timing (annual, daily, other) of seed production/release based on growth success and allocation be reasonable?
Mechanistic representation of the process: To quote a smart person, "if we create a 'black box' model tuned to a particular time and site, a neural network could do better". This leads back the the purpose of modelling - even if dispersal is added as a means to improve predictions, I think it has to be done in accordance with ecological theory in order to be meaningful.
Coexistence: If I understand correctly, there are som problems with achieving coexistence of multiple PFTs within gridcells (one PFT will tend to dominate completely?). I know some people are working with disturbances like fire, but obviously that's not the only process that creates coexistence in real life. Small-scale environmental heterogeneity is perhaps the most important, but dispersal limitation is also a governing factor. This is a reason to focus on the common events of the apple not falling too far from the tree.

What I'm imagining right now as a first concept, is that based on a cohort's growth it allocates a certain share of its growth to seed production. Seed dispersal distances are drawn (non-randomly, somehow, to avoid stochasticity) from a PFT-specific dispersal kernel*. If some seed travel distances are larger than a threshold**, they are sent outside FATES and land in another gridcell. The seeds that remain inside the gridcell limit how many new cohorts that can appear in the next relevant time step.

*E.g. a gaussian curve for wind dispersed seeds, this could also be a multi-topped distribution because a lot of plants have multiple modes of dispersal. Finding realistic dispersal kernels for PFTs will probably be difficult, but I think there should be enough literature to make some rough approximations. **An idea is to take the average distance of all points within the gridcell to its nearest edge

Does this sound plausible to you at all? I'm sure it will take a lot of effort to get there, but do you think something like this is doable, sensible and at some point worth the computing power? Also, I'm trying to think of how this will not introduce stochasticity but I'm not sure if I've thought it through well enough.

TL;DR / summary: I think we should separate bewteen-gridcell and within-gridcell dispersal of seeds, where the latter might be useful to enable coexistence of PFTs within gridcells and the former could be implemented outside of FATES somehow. We need to consider different modeling purposes and keep in line with ecological theory.

rgknox commented 5 years ago

@evaleriksen , I'm currently re-factoring patch level mass fluxes in the model to accommodate nitrogen and phosphorus. As part of the refactor I've added some terms (variables) to track external seed inputs (ie from other gridcells). (now that I think of it, we need to have an outflux term too..) It is only a placeholder, but it should help provide a starting point to whomever adds the grid-level dispersal algorithm.

rosiealice commented 5 years ago

Hey @evaleriksen, I came across this and thought it might be of some interest...

https://academic.oup.com/aobpla/article/11/5/plz042/5559435

evalieungh commented 5 years ago

Thanks for the tip, Rosie. That paper is definitely worth reading. I've put FATES a bit on the shelf for a while but I'm still interested in looking at the dispersal code once I get a bit further in my PhD.

ekluzek commented 3 years ago

We had some more discussion of this in the context of the spring CESM LMWG meeting. There was a talk that included lateral flow of water between gridcells every time-step. That's actually a much higher bar than anything that FATES would want to do. But, I do want to encapsulate some ideas that we had in some of our discussions. This includes ideas from Bill Sacks...

Temporal frequency matters for computational efficiency -- once a year it probably doesn't much matter how you do it, every time step would.
The decomposition of the Host Land Model (HLM) matters. In CLM we actually make this kind of thing more difficult by having grid points on the opposite side of the globe, but that means you'd have to have more communication of the nearby gridpoints. So you might want to be able to select a different decomposition if you have gridcell to gridcell communication going on.
To ensure the code is correct it would be good to have a mode that ensure answers are bit-for-bit on a different processor layout. Many methods would be roundoff different on a change in processor number and layout. Because of error growth you often can't tell that something is only roundoff different though. So it would be good to either have a switch to make sure answers are identical with different PE layout or just always require it.

adrifoster commented 3 years ago

Hey all,

Just to put my two cents in from my work with seed dispersal in a much less complex model. I also use dispersal curves (as in https://esajournals.onlinelibrary.wiley.com/doi/abs/10.2307/2265633) but with a "fat tail" to model wind-disperal and to take into account low probability long-distance events, so currently not taking into account eddies, etc.

Though I think it might be easy to modify the curve in different directions depending on the average (on whatever time step you are working on) prevailing wind direction/speed. Here's a paper that might be useful for incorporating wind https://www.nature.com/articles/s41558-020-0848-3.

The dispersal kernels in my model are species-specific (really just genus-specific right now due to lack of available information). I also determine a minimum dispersal density to actually consider, which would map to the number of surrounding gridcells you communicate with based on gridcell size and your dispersal equation. This way each gridcell only communicates with gridcells it could potentially seed/or receive seed from.

slevis-lmwg commented 3 years ago

...and then @billsacks replied

"I worked with @slevisconsulting a couple of years ago on something similar: beetle dispersal. I sketched out a general algorithm that I think amounted to one of your last points in this email – basically, doing a global gather to the master proc – and Sam implemented this. Probably not ideal – particularly in terms of performance – but it got the job done, and something like that could be reused at least for initial prototyping and scientific development."

An update about this...

The work that I did is publicly available on github. You can see the code modifications with this link.

For the dispersal across grid cells, search for mpi_allreduce in subroutine dynProgBB in /biogeochem/dynHarvestMod.F90. All the relevant code modifications are in dynHarvestMod, decompMod, and decompInitMod.

I performed very rudimentary testing of the code at the time, so no guarantees that it works correctly. I checked in with Jeff Hicke a few moments ago, and he felt it should also be clear that this version of the beetle model is out of date.

NGEET / fates

Seed dispersal in FATES #471