Questions about generating PEcAn allometries

SimonGoring commented 8 years ago

I have a suspicion I know the answer to this, but thought I'd ask anyway.

There are 301636 unique trees in the FIA dataset being run through PEcAn's allometry package. This takes a very long time, somewhere over 6 hours at least.

Interestingly, there are only 2035 unique dbh/pft pairs. How acceptable would it be to run the PEcAn allom.predict function on each of the unique pairs, and then use those values across the dataset?

Using one example, the SD of the first 155 records for a unique dbh/pft pair is 0.026 Mg/ha within a prediction interval with a range of ~14.3 Mg/ha (and a CI width of about ~2.37).

@mdietze @paciorek?

I'll keep running this, but I was just wondering how we'd feel about speeding up the code in this way.

crollinson commented 8 years ago

I'm having a hard time visualizing the workflow off the top of my head. Can you point me to where/how the allometry package fits into your workflow?

I don't know if this helps, but what @alexanderm10 & I did in the past is generate & store a bunch of betas for each PFT and then pull from that distribution, which was pretty quick. If for some reason this is still slow, I would think about parallelizing the operation by tree/PFT/cell, etc.

SimonGoring commented 8 years ago

It's in: https://github.com/PalEON-Project/PalEON-FIA/blob/master/R_scripts/PEcAn_biom.R

Lines 63 - 112 in particular. You can see we're looping through each tree. I'm worried about parallelizing because you still need to assemble the 301636 row object at the end. This way, if we hit a failure at some point we're still okay. Also, I wasn't sure how well this would run, so a for loop helps with initial debugging :)

mdietze commented 8 years ago

So the thing to think through in how you're using the uncertainties is how you're handling correlations among trees (if at all) in how you're aggregating uncertainties up from tree to plot to grid cell. On the parameter side (CI), using binned diameters means that you're assuming, for every draw, that all trees are on the the same curve -- that might be fine if you believe there is a 'true' curve and you just don't know it's true slope and intercept (which is the standard assumption). For the PI, on the other hand, it's much more tenuous to assume that all the residual deviations among trees are also identical.

That said, I wouldn't expect my code to take 6+ hr to do 300k draws, so the underlying code could probably benifit from a bit of refactoring. Question is whether that refactoring would take more or less than the 6 hr to do the run by brute force (my guess is more)

crollinson commented 8 years ago

I think what I'd recommend doing is instead of looping through each tree in 89, go by PFT and manually do what I think allom.predict does. This means you'll load the mcmc output (saved as a .RData file -- this may already be your allom.fit object), sample from the betas in that output, and then plug them into the generalized allometry equation (allom.eq <- function(mu0, mu1, DBH) { exp(mu0 + mu1 * log(DBH) )})

If you're worried about memory, loading the mcmc output one pft at a time should help with that, but I without digging into your object structure too much, 302000 rows shouldn't be a problem at all with R.

Does that help?

SimonGoring commented 8 years ago

Hi @crollinson thanks. This makes sense.

@mdietze you have no idea how slow my laptop is :)

SimonGoring commented 8 years ago

@crollinson it's not clear to me which parts of the loaded Rdata file I'd need. Any ideas?

crollinson commented 8 years ago

If you load "Allom.[SPECIES].[COMPONENT].Rdata, you'll get an mcmc object where each layer of the mc list is one of the chains. You'll then want to sample the coefficients from those chains, taking both allometric parameters from the same row at once to preserve their covariance. Ross and I used the Bg0 and Bg1 parameters because the mu values were doing very odd things.

Below is a snippet of how we sampled from the mcmc from Ross's stuff. Just for the example we sampled from the 3rd chain, but you can add a step to randomly pick the chain or pull evenly from the 3 chains.

allometries<- list()

# Sampling 500 random rows from the last 5000 runs of the MCMC
# Set the random  number seed so we always get the same results
set.seed(510)

# need to not pull rows with a negative mu1
# loading in the MCMC data from PEcAn for PIPO
load(file.path(allom.base, "Allom.PIPO.2.Rdata"))
allometries[["PIPO"]] <- mc[[3]][sample(which(mc[[3]][,"mu1"]>=0), size=500, replace=T),]

mdietze commented 8 years ago

@crollinson @SimonGoring now that there is an explicit allom.predict function in the package, I would discourage trying to homebrew the equivalent functionality in different ways

andydawson commented 7 years ago

@SimonGoring Just following up- is this resolved now? Trying to tie up loose ends as we make progress on the biomass manuscript.

PalEON-Project / PalEON-FIA

Questions about generating PEcAn allometries #5