bradduthie / resevol

BBSRC FAPESP Newton Funded Project to model how the spatial scale of heterogeneity in fungal isolate application and crop plant cultivation affects variation in the selection landscape for biopesticide resistance.
3 stars 0 forks source link

Working toy simulations #19

Closed bradduthie closed 3 years ago

bradduthie commented 6 years ago

New code is needed to run multiple iterations of the toy_simulate_resistance function across different parameter values. The key issue here is to get a range of parameter space for the heterogeneity of the landscape, which I plan to do by (at first) running the same set of parameter values, but building a new function run within the toy_initialise_land function. Currently, crops and pathogens are assigned to cells randomly, but some sort of toy_set_land_block could ensure that the whole landscape is separated into $N$ discrete blocks. This line of code from GMSE might be helpful to set blocks.

Once the blocks are set, we can then change $N$ to increase or decrease the scale at which heterogeneity of crops and fungal biopesticides are applied (i.e., high $N$ means more continuous blocks are created, low $N$ means few blocks are created -- N < 2 means that there's only one big block). Crop and fungal biopesticides would then be assigned based on some simple rules.

bradduthie commented 6 years ago

I have made progress addressing this issue with the following function set in commit 3b958cfe99182fc936c3dd24ecdd169bf425b3d7. It's not as efficiently written as it could be, but it also won't be the rate limiting step for simulations at all -- fine to be a bit slower to initialise (the triple four loop slows things down).

toy_block_land <- function(xdim, ydim, pathogens, crops, block_len){
    cells     <- xdim * ydim;
    cell_left <- cells %% (block_len * block_len);
    if(cell_left > 0){
        stop("Landscape cannot be divided into squares of size 'block_len'");
    }
    LAND      <- array(data = 0, dim = c(xdim, ydim, 3));
    tx0       <- 1;
    tx1       <- block_len;
    ty0       <- 1;
    ty1       <- block_len;
    block_num <- 1;
    while(tx0 <= xdim & ty0 <= ydim){
        LAND[tx0:tx1, ty0:ty1, 1] <- block_num;
        tx0                       <- tx1 + 1;
        tx1                       <- tx1 + block_len;
        if(tx0 > xdim){
            tx0 <- 0;
            tx1 <- block_len;
            ty0 <- ty1 + 1;
            ty1 <- ty1 + block_len;
        }
        block_num <- block_num + 1;
    }
    block_num  <- block_num - 1;
    for(block in 1:block_num){
        rand_path <- sample(x = 1:pathogens, size = 1);
        rand_crop <- sample(x = 1:crops, size = 1);
        for(xx in 1:xdim){
            for(yy in 1:ydim){
                if(LAND[xx, yy, 1] == block){
                    LAND[xx, yy, 2] <- rand_path;
                    LAND[xx, yy, 3] <- rand_crop;
                }
            }
        }
    }
    return(LAND);
}

I'll show the nice images of blocked landscapes produced by the function in the notebook later. Actually, if someone (@luc-bussiere or @RosieMangan) can give a good landscape size to consider (e.g., 1000 by 1000 km), then I could probably start real simulations quite quickly.

Update: The lab notebook has been updated with some progress.

bradduthie commented 6 years ago

A potential challenge here that I should have thought about. Under the current model assumptions, keeping pest resistance low is actually easier when block sizes are large -- mainly because genetic variation is harder for pests to maintain when there is environmental variation in time rather than space. The crops and fungal biopesticides rotate in time, and the pests cannot keep up or disperse to a refuge. Perhaps, to make it biologically realistic, the pest generation time needs to be much faster than the crop and biopesticide rotation time? Does that make sense, @RosieMangan and @luc-bussiere ?

luc-bussiere commented 6 years ago

Yes, good thinking Brad -- the pests will almost always have a generation time that is a fraction of the rotation (for helicoverpa and most of its hosts, maybe 3-5 generations per harvest). Will this resolve your concern?

luc-bussiere commented 6 years ago

Also 1000 x 1000 km seems a reasonable scale for starting. Thanks!

bradduthie commented 6 years ago

Hi @luc-bussiere @rozeykex @RosieMangan -- the good news is that there is a working toy model now that can run replicate simulations (up on the dev branch, have not merged with master yet). The bad news is that they take a long time -- like, at least overnight for a single run. This is primarily for two reasons: (1) the code is written in R instead of C (we'll get orders of magnitude more speed when I write this up properly with an R to C link in Phase II), and (2) the code was written quickly and probably with some inefficiencies in R (though I doubt enough to reduce simulation time by more than 50 percent with R alone).

I think that the best option now is for me to try to simulate what I have. I can set aside six cores on my Desktop, and might ask for some help once I'm sure that the results are good and we can start replicating them. If you want to try all this out for yourself, you can highlight all of the code in the working toy model, read it into R, and then run the replicate_toy_sims function with something like the following:

reps <- replicate_toy_sims(generations = 20, xdim = 100, ydim = 100, pathogens = 3, crops = 3,
                                           pest_init = 50000, crop_rotate = "rotate", path_rotate = "rotate", 
                                           fecundity = 900, cell_K = 200, block_len = 1, rep_num = 1, 
                                           crop_freq = 5, path_freq = 5);

Again, this will take a long time, and it might not get good data yet (I hope I will know by tomorrow). But the data that gets produced by all of this looks like this. While I'm fiddling with the simulations to make sure the parameters are okay (don't worries about the actual values yet), please let me know if you want anything else out of the simulations in terms of data.

The 100 by 100 scaling might be the best we can hope for at the moment (will check later) due to speed. The fecundity is not a huge slow down, and I set it to something high based on this paper, but let me know if that should change.

bradduthie commented 6 years ago

Here's a bit better example of the toy results. This is from the code below, which takes considerably less time to run (ca 10 minutes).

sim <- replicate_toy_sims(pathogens = 3, crops = 3, crop_rotate = "rotate", 
                                         path_rotate = "rotate", block_len = 4, xdim = 4, 
                                         ydim = 4, pest_init = 8000, generations = 40, 
                                         fecundity = 100);

You can see the percentage resistant rise over generations, then the fungal biopesticides and crops are rotated after 10 generations, causing resistance and ability to eat crops to drop suddenly (but then rapidly evolve resistance and eating ability again). This is a good baseline -- it's what we expect the model to do.

bradduthie commented 6 years ago

And here's an example when we add the heterogeneity and rotation.

sim <- replicate_toy_sims(pathogens = 3, crops = 3, crop_rotate = "rotate", 
                                         path_rotate = "rotate", block_len = 1, xdim = 4, 
                                         ydim = 4, pest_init = 8000, generations = 40, 
                                         fecundity = 100);

All that's changed is block_len = 1 instead of 4, so there are 16 different blocks on the landscape operating independently to rotate their pathogens and crops. Resistance and eating ability stabilises around 50 percent instead of spending most of the time over 90 percent.

RosieMangan commented 6 years ago

Hey @bradduthie I ran the simulation using sim <- replicate_toy_sims(pathogens = 3, crops = 3, crop_rotate = "rotate", path_rotate = "rotate", block_len = 1, xdim = 4, ydim = 4, pest_init = 8000, generations = 40, fecundity = 100);

RosieMangan commented 6 years ago

and sim <- replicate_toy_sims(pathogens = 3, crops = 3, crop_rotate = "rotate", path_rotate = "rotate", block_len = 4, xdim = 4, ydim = 4, pest_init = 8000, generations = 40, fecundity = 100);

RosieMangan commented 6 years ago

Happy to run the longer simulation if thats helpful?

bradduthie commented 6 years ago

@RosieMangan Awesome! Thanks! I'm currently running six versions of the following simulation:

sim <- replicate_toy_sims(pathogens = 3, crops = 3, crop_rotate = "rotate", 
                                          path_rotate = "rotate", block_len = 1, xdim = 100, 
                                          ydim = 100, pest_init = 400000, generations = 20, 
                                          fecundity = 100, print_file = "test_block_len_1.csv");

In each of the six, block_len varies. I should know by the end of the day whether or not this looks good. If it does (and you like the parameter values and output), then we can all coordinate running replicate simulations and using all of the results for the figure in the Phase II proposal.

bradduthie commented 6 years ago

@luc-bussiere @RosieMangan @rozeykex Progess update: I came in this morning to find that several simulations have been killed mid-way through. This isn't an R thing because the terminal cut out of R and into the command line, meaning that a processor was probably overloaded. I think I'm going to have to adjust the scale here.

bradduthie commented 6 years ago

Hey @luc-bussiere -- how much time do we have on these? I want to try one more thing before taking the more drastic step of scaling down simulations.

luc-bussiere commented 6 years ago

@bradduthie We have to submit the final version Oct. 11, which should give you a bit of play time as needed? I hope to have figures etc in near final version by Oct 9 or so.

bradduthie commented 6 years ago

@luc-bussiere Yeah, that should give me time to fiddle with this and figure it out. I think that the amount of processing and memory that R is doing is making the computer angry, so I'm just going to refactor the code so that R writes things to the hard drive and removes them after they're no longer needed.

bradduthie commented 6 years ago

A quick explanation of what's going on with the code while I read the updated draft 2. Because the simulations are stretching the 'toy' part of toy model, some massive amounts of memory were being needed to hold all of the simulation data, causing crashes. To fix this, the simulation now prints off results from each generation as a CSV file, then removes and garbage collects memory from old variables to avoid things getting out of hand. To summarise the results, some tweaks to the data summary functions needed to be made and a new toy_collect_file_output is also needed to grab the files from the saved directory. I've tested these, and they seem to work just fine. What I am now waiting on are the results from six simulations run in parallel at different scales of crop and fungal biopesticide heterogeneity (blocks of 1, 5, 10, 25, 50, and 100 cells on a 100 cell landscape). Twenty generations of each should be available by tomorrow morning. Simulations being run look like this:

sim <- toy_simulate(pathogens = 3, crops = 3, crop_rotate = "rotate", 
                                path_rotate = "rotate", block_len = 1, xdim = 100, 
                                ydim = 100, pest_init = 50000, generations = 50, 
                                fecundity = 200, cell_K = 400, 
                                print_dir = "../data/block_len_1/");

A version of the toy_simulate function is on the developing branch, but I've not pushed anything while the simulations are running. Note that I'm going to cut the simulations off at 20 generations (at least for now) for the sake of time. Also note that if these parameters don't look good, we can change them, but we want to be careful with how much more data will be needed to simulate (i.e., bigger landscapes, more pests). This shouldn't be a problem with the full model, in which I'll plan the memory management with much more detail and write in C.

Based on the speed at which everything is running, I think that our intuitions about what should happen is actually happening, which is good. The simulations in which block_len is low are running much faster, meaning that population sizes are probably lower when landscapes are more heterogeneous. Smaller scale simulations also seem to be consistent with this.

rosemckeon commented 6 years ago

@bradduthie I'm enjoying the details on how you go about speeding the code up. Sounds like the current method, with data for every generation, will set us up nicely for visualizing the whole thing as a slideshow too.

bradduthie commented 6 years ago

Thanks @rozeykex -- Simulations are still running on six cores today, though after a brief discussion with @luc-bussiere and Matt this morning, I think they might already be a bit out of date.

Thinking of fecundity as important, I massively increased the fecundity (to ca 500) of females based on results from a recent paper, but this slowed things down much more than I had initially realised. I've gone back to a lower fecundity of 10, with @luc-bussiere and Matt assuring me that this should be okay for the proof of concept (I agree). This makes it much easier to scale things appropriately and have more realistic dispersal.

So here's what I'm now going to try now:

  1. Cut all running simulations off after 20 generations (all should be done by mid afternoon). I'll look at the results from these but probably not use them.
  2. Figure out if I need to use a 100 by 100 landscape, or if I can get away with a 200 by 200 or even 1000 by 1000 landscape if fecundity = 10, with the criteria being whether or not a round of simulations can finish overnight -- if so, good to go.
  3. Start running six simulations of different block_len by the end of the day, with the idea that they'll be ready tomorrow morning to send.

I think I also need to do a bit of exploring with crop_freq and path_freq -- the number of generations a pest cycles through before the crop or pathogen applied changes.

bradduthie commented 6 years ago

@luc-bussiere @RosieMangan @rozeykex Quick update. The first round of replicate simulations are done, but my home computer cannot seem to process the files without crashing. I should have the first round of results plotted (I hope) by the end of the day Monday though.

luc-bussiere commented 6 years ago

You're a legend, Brad! No huge rush on this -- I appreciate all you're doing to work through things.

RosieMangan commented 6 years ago

Deadly!