GLEON / MilleLacsMystery

Bass/Walleye modeling of Mille Lacs, MN
0 stars 3 forks source link

Configure day-specific Kd feeder data for GLM #3

Open jordansread opened 7 years ago

gjahansen2 commented 7 years ago

Seasonal pattern plus annual variability. How do we balance being true to the observed data with tracking noise?

jordansread commented 7 years ago

My suggestion is that we create a characteristic seasonal curve for the lake and do a simple parameterization of that curve that stretches or amplifies it to fit the year. Maybe we could end up parameterizing it with just the annual median/mean, or perhaps we would need a second parameter for timing.

lawinslow commented 7 years ago

Ok, quick crack at this. I'm not seeing any crazy seasonal variability. Generally looks like Secchi generally declines throughout the year, but I don't see a big clear water phase "bump" you sometimes see. Overall, we also see a bit of an increase towards the end of the year.

Looked at seasonal variability for pre and post 1997 (the year which splits the data into equal halves). image

image

My vote is to use the empirical seasonal variability. Basically, use the median of each bi-weekly bin for the overall pattern. Then use a multiplier to bump it down early in the season and bump it up later in the season (tie it together with a winter linear interp, doesn't matter much in the winter). This will amplify the annual change a bit, which lines up with the trends we see (which are steeper in the clearer periods, and muted in the less clear periods).

image

Trend is somewhat harder. Do we want to capture the big bump we see in 1997? If so, we can't just apply a monotonic trend. We could use a more empirical approach where we try to re-construct the overall long-term variability and use the seasonal variability reconstruction as above.

image

Thoughts?

gjahansen2 commented 7 years ago

I don't want to impose a linear trend over time (year to year) - the peak in 1997 is real I think (many reports from that time reference the unusually high water clarity). There seems to also be a peak in 2013.

As for seasonal patterns, I agree that the general trend appears to be the same from year to year (declines later in the year). I am not sure how much creedence we should put in the seasonal patterns early and late in the year since they are so infrequently sampled. So I don't want to tie it too strongly to the bi-weekly data in those parts of the year where we only have one or two sampling dates.

So Jordan's suggestion of a single seasonal curve, parameterized by the annual median or mean seems reasonable to me. But then how do we link Dec 31 of one year to Jan 1 of the next? Or does it not matter in ice-covered periods, and we can just have clarity jump up or down to the year-specific pattern after the ice goes off?

lawinslow commented 7 years ago

I am not sure how much creedence we should put in the seasonal patterns early and late in the year since they are so infrequently sampled.

Yeah, sure, the far edges are uncertain, but the lake seems to have reasonable coverage much of the open-water season.

So Jordan's suggestion of a single seasonal curve, parameterized by the annual median or mean seems reasonable to me.

Question here though is what is the parameterizable "single seasonal curve"?

But then how do we link Dec 31 of one year to Jan 1 of the next? Or does it not matter in ice-covered periods, and we can just have clarity jump up or down to the year-specific pattern after the ice goes off?

I'm not worried about a dec 31 to jan 1 jump. Won't affect much that time of year. We can pick something reasonable for when there is no observation coverage. I think high during that time makes sense. Secchi of 5 or so.

jordansread commented 7 years ago

The curve I was referring to is like a somewhat complex shape (maybe some loess or something) but boiled down to two parameters: 1) the mean, and 2) the amplitude. A third would be stretching vertically, but that seems harder since you need to know how long the season would be a priori. For mean and amplitude, it is akin to stretching a shape vertically or translating it vertically (amplitude and mean respectively).

lawinslow commented 7 years ago

Ok, you want to take first crack at that? Seems like you have a clear vision.

gjahansen2 commented 7 years ago

what about some kind of moving average?

jordansread commented 7 years ago

A moving average of that year's data, or the whole thing flattened to DOY?

jordansread commented 7 years ago

Was thinking something simple like this that we can adjust w/ two params: image

lawinslow commented 7 years ago

Hmmm, do we not want to fit to that portion of the year? There are very few obs then.

jordansread commented 7 years ago

or something like this that flattens out to an early spring mean and flattens to a fall mean: image

difference being what I can quickly code vs draw in ppt...

lawinslow commented 7 years ago

Yeah, got an equation for that? The challenging part there can be finding something well behaved with the right number of parameters. (hence my advocacy for a more empirical approach).

jordansread commented 7 years ago

empirical sounds good here, but often we may not have enough secchi data and may need to pick a curve off of the shelf (e.g., dystrophic, eutrophic) and apply it.

jordansread commented 7 years ago

The first plot is just a weibull distribution

gjahansen2 commented 7 years ago

Or if we have a year with no/little data, use an adjacent year? Create an 'average' year to use when year-specific data are missing? Like we did for Wilma Secchi.

On Thu, Mar 30, 2017 at 3:08 PM, Jordan S Read notifications@github.com wrote:

The first plot is just a weibull distribution

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GLEON/MilleLacsMystery/issues/3#issuecomment-290529778, or mute the thread https://github.com/notifications/unsubscribe-auth/ASiU6krLTfjYRE3nkUNSWoA0BKP5aveKks5rrAutgaJpZM4Mojwv .

-- Gretchen Hansen https://gretchenhansen.squarespace.com

lawinslow commented 7 years ago

For future reference, here is the best simple model I could come up with for seasonal variability of secchi in the LTER lakes. image

ratio is defined as median(cw_phase_secchi)/median(summer_secchi), though they aren't defined by season, I just eyed up the time window of the clear-water phase in those lakes (cw_phase - week 22:23, summer - week 27:42). Adding DOC into the linear model improves it (R^2~0.8, and has lower AIC) so there's that.

jordansread commented 7 years ago

interesting...cool

gjahansen2 commented 7 years ago

I am reading some old (early 1990's) PCA reports about Mille Lacs. They describe Secchi as follows: "Under ice readings range from 4.5-5.5m with very low levels of algae and suspended solids. Following ice out, Secchi transparency declines quickly to 3 m or less. Transparency continues to decline, coincident with peak algae concentrations, until late August with a minimum reading of 1.3 m".

Given the paucity of under ice data, I think the shape should be something like a flat line (with high Secchi) until ice out, then a fairly steady decline from right around ice out through the end of August, then flat until ice on. The actual values of the starting and ending points could be parameterized based on observed open water medians for the year (where available). When we look at annual mean/medians, it is probably important to remove samples from winter, since they only exist for a few years and will skew the data for those years to be more clear than if only open water season was sampled. This figure has data only from April-Oct.

mille_lacs_secchi_timeseries_april_oct .

gjahansen2 commented 7 years ago

OK, I played around with some simple logistic functions, parameterized with year-specific highs and lows. mille_lacs_secchi_seasonal_april_oct_logistic

This might be close to what we want, but I don't like how (potentially) bad measurements (or few measurements) in a year can drag the values up or down. I think a better way would be to construct a mixed-effects non-linear model of this same form, with a random year effect that lets the max and min thresholds vary by year. BUT those year specific estimates will be influenced by all measurements, so if a year is not well-sampled (or not at all), it will drag that year's estimates closer to the overall mean. What do you think?

gjahansen2 commented 7 years ago

Logistic function is just seasonal=function(x,max, min){min+((max-min)/(1+exp(.05*(x-200))))}, where I chose the rate of change as 0.05 and the mid point of transition period as 200. We could fit those to data as well if we wanted to.

lawinslow commented 7 years ago

Yeah, I think that would be a decent approach, though my only concern would be the big jump in clarity at the new year transition. It would be a nice fusion of model and empiricism.

jordansread commented 7 years ago

Maybe we apply this to a certain DOY range (100-320?) and then interpolate between the years so that the transition changes aren't abrupt?

gjahansen2 commented 7 years ago

^^ this is what I was thinking too - apply the modeled curve throughout the open water season and then interpolate between ice on and ice off to get from ending value of year x to starting value of year x+1. I fit the nonlinear mixed effects model and estimated year-specific maxima and minima. Looks like some of the years are kind of funky. The black line is the generic fit.

secchi_function_mixed_model

gjahansen2 commented 7 years ago

customLogitModel=function(x,max, min){min+((max-min)/(1+exp(.046*(x-192))))}

Where 0,046 and 192 were estimated from a simple nonlinear model, and max and min are estimated as year-specific random effects.

jordansread commented 7 years ago

yes, this seems doable.

lawinslow commented 7 years ago

Can someone re-create a daily Kd value based on that? I could start running the model then.

gjahansen2 commented 7 years ago

Are we just doing kd=secchi/something? What is the 'something' again?

Here is what Optical habitat looks like (3-year moving average) with the new seasonal pattern included (note generic parameters used for much of the 1980's but word on the street is that the Mille Lacs Band of Ojibwe might have data from that time). mille_lacs_oha_timeseries_seasonal

jordansread commented 7 years ago

I think it is 1.7/Kd

gjahansen2 commented 7 years ago

ok, so we want daily values for all years, with linear interpolation between end of season in year x and beginning of season in year x+1. I can create this.

Question: Do we want the end and beginning of years to be based on some arbitrary day numbers, or based on ice on and ice off? I just used DOY 110-310 in the figure shown above, but it seems reasonable to use the modeled ice on/ice off dates as those cutoffs. And that is not something I can incorporate at this point. Also it is perhaps a bit of a chicken vs. egg situation - if i need modeled outputs to calculate daily kd and you need daily kd to calculate modeled outputs...

lawinslow commented 7 years ago

I'd just say arbitrary. You can be conservative if you want. I really don't think the model will be terribly sensitive to these numbers. You're right that we don't have ice on/off so it would be hard to pick them now.

jordansread commented 7 years ago

I think starting out with the system you have above w/ an easy to change set of parameters that we can use to tell us how sensitive the model is to that time range (if we choose to look into it).

gjahansen2 commented 7 years ago

Hmm.. okay here is a stab at it. Modeled Secchi as a function of DOY from DOY 100-320, then linearly interpolated between missing values. daily_kd_interpolated.txt

@lawinslow data are there ^^^

secchi_seasonal_interpolated

kd_time_series_interpolated

gjahansen2 commented 7 years ago

` library(ggplot2) library(dplyr) library(lme4) library(reshape2) library(readr) library(ggrepel) library(RColorBrewer)

data from Heidi

mydata=read_csv("ML_secchi.csv") mydata$date=as.Date(mydata$date, format="%m/%d/%Y") mydata$doy=as.numeric(format(mydata$date, format="%j"))

year.medians=summarise(group_by(mydata, year), median.secchi=median(secchi_m))

exclude data outside window used for TOHA

toha.secchi=subset(mydata, doy>100 & doy<320) year.medians.toha=summarise(group_by(toha.secchi, year), median.secchi=median(secchi_m)) year.medians.toha$date=as.Date(paste(year.medians.toha$year,"-07-01", sep=""))

create function for seasonal pattern

toha.secchi$week=as.numeric(format(toha.secchi$date, "%U")) toha.secchi$year.f=factor(toha.secchi$year)

customLogitModel=function(x,max, min){min+((max-min)/(1+exp(.046*(x-192))))}

k=0.046, midseason=192 as estimated from nls

customLogitModelGradient <- deriv( body(customLogitModel)[[2]], namevec = c("max", "min"), function.arg=customLogitModel )

starting guesses

secchi.nls=nls(secchi_m~customLogitModel(doy, max, min, k, mid.season), data=toha.secchi,start = c(max=4, min=1.5, mid.season=200, k=1))

Fit the model

model <- nlmer( secchi_m ~ customLogitModelGradient(x=doy, max, min) ~

Random effects with a second ~

(max | year.f) + (min|year.f) , 

data = toha.secchi, start = c(max=4, min=2.3) )

summary(model) year.estimates=coef(model)$year.f general.estimates=fixef(model) write.csv(year.estimates, "year_secchi_max_min_estimates.csv", row.names=T) write.csv(data.frame(general.estimates), "generic_secchi_max_min_estimates.csv",row.names=F)

create time series - if year is missing or bad estimate, use generic

yearly.secchi=data.frame(year.estimates ) yearly.secchi$year=row.names(yearly.secchi)

secchi.params=data.frame(year=seq(1974,2016)) secchi.params=merge(secchi.params, yearly.secchi, by="year", all=T)

bad estimates if min is bigger than max

secchi.params$max[secchi.params$max<secchi.params$min]=NA secchi.params$min[secchi.params$max<secchi.params$min]=NA secchi.params$max[is.na(secchi.params$max)]=general.estimates[1] secchi.params$min[is.na(secchi.params$min)]=general.estimates[2]

create data frame of secchi depths

day.list=data.frame(date= seq(as.Date("1974-03-31"), as.Date("2016-12-31"), by="1 day")) day.list$doy=as.numeric(format(day.list$date, "%j")) day.list$year=as.numeric(format(day.list$date, "%Y")) daily.kd=data.frame("doy"=0, "year"=0, "date"=as.Date(0), "Secchi"=0)

for(i in 1:nrow(secchi.params)) { this.year=subset(day.list, year==secchi.params$year[i]) open.water=data.frame(doy=seq(100,320), Secchi=customLogitModel(x=seq(100,320), max=secchi.params$max[i], min=secchi.params$min[i]), year=as.numeric(secchi.params$year[i])) this.year=merge(this.year, open.water, by=c("doy", "year"), all=T) daily.kd=rbind(daily.kd, this.year) } daily.kd=daily.kd[2:nrow(daily.kd),]

daily.kd$Secchi=na.approx(daily.kd$Secchi, na.rm=F) daily.kd$kd=1.7/daily.kd$Secchi write_tsv(daily.kd, "daily_kd_interpolated.txt")

ggplot()+geom_path(data=daily.kd, aes(date, kd))+theme_classic() ggsave("kd_time_series_interpolated.png", height=5, width=8, units="in")

ggplot()+geom_path(data=daily.kd, aes(doy, Secchi, group=year, colour=factor(year)))+theme_classic() ggsave("secchi_seasonal_interpolated.png", height=5, width=8, units="in") `