Interfacing ecodata with stock assessments (WHAM)

brianstock-NOAA commented 4 years ago

This is awesome that you've updated the CPI and made it available via ecodata. However, the CPI in ecodata doesn't match the data file I have from Miller et al. (2016), which is distributed here in the WHAM package. It'd be nice for WHAM to just get the CPI (and other environmental data) from ecodata. But first, there are a couple issues to work out:

The Miller et al. (2016) version starts in 1973, ecodata's starts in 1977.
The estimated CPIs differ. I'm assuming this is because they're calculated as anomalies over different time periods. It would be nice going forward if previous index values don't change. One solution could be to define the time period to average over as that in Miller et al. (2016), 1973-2011.
The ecodata version doesn't have SE estimates. These are important for the state-space assessment model, since the SE determines how much weight to assign the yearly CPI changes. It looks like the MATLAB code contributed by Chris Melrose does calculate a CP_year_r_se, saved as the SE_Cold_Pool_Index column in the output csv file. Note that we can estimate the SE of an environmental covariate within WHAM (example using the GSI), but it's an additional parameter(s) and better to use the calculated SE.

Chris's note on lines 155-165 seems relevant to both (1) and (2).

andybeet commented 4 years ago

@brianstock @kimberly-bastille The tech doc describing the cold pool index methodology probably needs to have more detailed explanation to address some of Brian's comments, especially since the tech doc cites the same literature that Brian has

sgaichas commented 4 years ago

Thanks @brianstock-NOAA and @andybeet for pointing this out. I'd like to have a more standardized approach to which years are included for anomaly calculation for as many indices within ecodata as possible. Since the 2020 SOEs are out, I think we should have this discussion to see how much we can standardize for 2021.

kimberly-bastille commented 4 years ago

The CPI data we receive for ecodata is preprocessed so it may be a good idea to loop in Chris Melrose to answer the specific questions about how he derived this data.

khyde commented 4 years ago

I forwarded the issue to Chris.

khyde commented 4 years ago

From Chris...

On item 1: I can push it back to 1973 if desired. The 1977 cutoff was just to match the start of Marmap when there was higher quality data available.

On Item 2: Yes, I believe Brian is correct about the main reason for the differences from the old Miller version. I was able to reproduce the Miller paper figures with Jon's code if I use the same input data/period so the math itself is definitely the same. I made no changes to the calculations in Jon's code, I essentially only changed the input data so basically any difference is related to the differences in input data and time period. Probably could alter how it does the anomalies to keep a consistent baseline so that the prior years stay the same as new ones are added.

On item 3: I think it would be pretty easy to add the SE to the output.

All that said, I think it would be nice to do something more sophisticated than this index in the future as we have discussed before. This was just the easy way since we had the code from Jon.

brianstock-NOAA commented 4 years ago

Thanks everyone for the quick feedback. ecodata is looking good, I'm excited to use it!

I'm not concerned about replicating the CPI from Miller et al. (2016), so no problem if you wanted to change the time period to use for the mean in the anomaly calculation to be standardized across ecodata. Also ok if you change the method to something more sophisticated, as long as it's documented :)

brianstock-NOAA commented 4 years ago

(Posting here bc it's similar enough I think)

Trying to use environmental covariates from ecodata in stock assessments via wham, we have similar issues for the Gulf Stream Index that we have for the CPI:

GSI

These are more general than just the CPI or GSI:

detrend or not?
anomalies calculated using mean from what time period?
how to calculate SE?
provide more than one version? E.g for GSI, one based on SSH (2020 SOE) vs bottom temp (2019 SOE). My guess is that you made the switch bc the SSH-based GSI is preferred from a physical oceanography point of view? Different datasets have different temporal coverage, and it's worth thinking about pros/cons to different flavors of these indices for stock assessment applications. Re: temporal coverage, most of the NEFSC groundfish assessments start in the 1970s-80s, but some start much earlier (e.g. GB haddock starts 1931). In that case, a version of an index that is "less good" but much longer may be preferred? Also relevant: Nye et al. (2011) and O'Leary et al. (2019) used spring GSI.

For discussion. I know the primary purpose of ecodata is not to interface with stock assessments or wham, but seems like it could with some relatively minor changes.

Nye et al. 2011. https://www.nature.com/articles/ncomms1420 O'Leary et al. 2019. https://www.nrcresearchpress.com/doi/abs/10.1139/cjfas-2018-0092 Xu et al. 2018. https://onlinelibrary.wiley.com/doi/full/10.1111/fog.12236

kimberly-bastille commented 4 years ago

This might be a good discussion to have with the larger LTL/habitat group at the SOE debrief on Wednesday (5/20/20). I sent you an invite @brianstock. One of the requests from the council is to expand the CPI, I think a discussion of GSI could also be useful.

slarge commented 4 years ago

@brianstock -- thank you for challenging the purpose of ecodata. Your feedback is excellent and will definitely help us improve the product. I think it will be good to define if the purpose is it just for the SOE reports or as a broader ecological indicator hub.

@slucey -- do you recall the CRAN package that DFO produced and highlighted during WGNARS? It might help guide our thinking, here.

Also, as the issue of reference periods for anomalies has come up before -- do the pros outweigh the cons of making it a user-defined argument (e.g., ecodata::as_anomaly(data = "CPI", reference_period = c(1980, 1989))). The calculation obviously isn't the hard part, but it would require us to fundamentally change the structure of the package.

slucey commented 4 years ago

I don't recall off the top of my head but will check with Jamie.

khyde commented 4 years ago

@slarge I think the idea of having a broader ecological indicator hub is exactly why we need a database and ERRDAP. This way the SOE and others can pull from the exact same database.

I also think we need to have a discussion on how we define our anomaly calculations because I don't believe there is a one size fits all solution depending on what you want the anomaly data to portray. Do you want to compare the anomaly to a baseline or look at inter-annual variation?

brianstock-NOAA commented 4 years ago

I'm happy to stoke discussion, that's the easy part! The idea of an EDAB-curated package with ecological indicators that interfaces with a PopDy-curated package that fits stock assessments using these indicators is attractive. And we're 95% of the way there already. It seems silly for me to distribute CPI.csv and GSI.csv files in WHAM when ecodata also has them, in addition to dozens of other variables that could be used as covariates in assessment models.

slucey commented 4 years ago

I don't recall off the top of my head but will check with Jamie.

The package is called marindicators: https://cran.r-project.org/web/packages/marindicators/index.html

NOAA-EDAB / ecodata

Interfacing ecodata with stock assessments (WHAM) #33