EcoClimLab / growth_phenology

Cameron Dow's growth phenology project
Creative Commons Attribution 4.0 International
1 stars 2 forks source link

use ForestGEO weather station data #6

Closed teixeirak closed 3 years ago

teixeirak commented 4 years ago

https://github.com/forestgeo/Climate/tree/master/Climate_Data/Met_Stations/SCBI

teixeirak commented 4 years ago

@camerondow35 (and @rudeboybert ), a reminder that we'd still like to switch to the SCBI weather station data. The current data should give us a close approximation, but windows could shift slightly when we change sources.

camerondow35 commented 4 years ago

Two things about the weather tower:

1.) I worked on switching over to our tower data but found that it was a bit of a mess. There were quite a few random spikes that made things difficult + the fact that it is 5 min observation data makes it even harder. Do you think it would be worth the effort to get it in working order right now?

I was planning on asking if you'd like me to work on a script that could extract the daily / weekly / monthly mean variables after we are done with this paper. That being said, i'd be happy to try to get it working now - and I think i'm getting decently believable results already, just seems like it'll take a while to be 100% certain.

For example: here are the average temperatures for the period of 3/15 - 4/23 according to our tower w/ slightly cleaned data. Year | x | Average temp F 2011 | 9.524257 | 49.14366313 2012 | 11.23162 | 52.21691131 2013 | 7.197041 | 44.95467426 2014 | 7.325073 | 45.18513211 2018 | 12.72465 | 54.90437412

2.) If we bring in other sites, wouldn't we want to compare them with similar climate data sources? Or does that not matter?

teixeirak commented 4 years ago

It would be really great to get this done. We could use the NOAA data to check for unreasonable values and gap fill.

This may be useful, although I'm not sure if it does an standardization or gap filling.

I'll have to come back to this.

teixeirak commented 4 years ago

@camerondow35 , we definitely don't want to re-invent the wheel here! There must be some existing scripts to handle weather data standardization and gap filling.

This looks like it could be what we need.

The link given in the previous comment (PEcAn functions for processing meteorological data) is part of the PEcAn project, led by Mike Dietze. Mike had previously mentioned to me that his group had developed some scripts for climate data standardization. I'm not sure if this includes gap filling, and I'm also not sure exactly what the linked PEcAn function does, as the links to documentation seem to be broken. I just wrote Mike to see if they have anything more.

teixeirak commented 4 years ago

For the record, here's email correspondence from Mike Dietze and Christy Rollinson:


On Sep 4, 2020, at 7:27 AM, Michael Dietze dietze@bu.edu wrote:

Krista,

So all the met tools are within pecan/modules/data.atmosphere

https://github.com/PecanProject/pecan/tree/develop/modules/data.atmosphere/R

The default gapfilling, metgapfill.R, was written by Ankur Desai and leverages the Ameriflux MDS approach with a few extra special cases. It is generally good for filling small gaps (e.g. QC flagged data), but not for filling large gaps (e.g. months) when systems are down.

There’s an even simpler version based on splines and linear models -- again it’s really only good for small gaps

For larger gaps I’d use the tdm_* scripts, which is something that Christy Rollinson built with help from one of Ankur’s grad students (CCing Christy as an excuse to check in on how the paper on this is coming). This code is really for downscaling, not gapfilling, but for large gaps what I’d do is to use the downscaling code to downscale a spatially-coarser reanalysis product (I’m particularly fond of ERA5, and there’s code for downloading that in the data.atmosphere module). Christy’s code needs a training data set at both scales (local and coarse) and build a complex series of GAMs across variables and across a moving average through day-of-year. It can also produce ensembles of outputs to capture the uncertainty associated with downscaling/gapfilling. It’s more computationally demanding, but more sophisticated than anything else I’ve got. Once you do the downscaling it should be pretty easy to just substitute these values for any NAs in your original time-series

— Mike


Hi Krista and Mike,

Like Mike says, the TDM scripts weren’t built for gap filling, but what Mike suggests with downscaling a coarser product and inserting the values into your gaps should work. However, right now this could be a bit buggy since the code was developed with producing ensembles that propagate uncertainty in mind. In the work for a different project (MANDIFORE), I discovered that the single-use instance where you don’t produce ensembles appears to be buggy and needs more than the quick hack I had put in there.

I was beginning to track this down with COVID kicked me out of my office and my home internet connection was making de-bugging hard. I’m planning to spend 1-2 days in my office starting next week to knock out modeling stuff, so hopefully I’ll be able to finally polish off the TDM paper and a forest management manuscript this fall.

One key thing that might be important depending on what kind of weather station variables you need is that I spent a lot of time playing with the equations in my scripts to at least try to preserve met variable covariance. I suspect this is more focused on variables/resolutions you may not care about like sub daily long/shortwave radiation, temperature, and wind, but if that is important, my workflow might be worth the trouble.

I’ll bump this higher on my to-do list and make polishing off some of the TDM use cases a priority for end of September. Krista, let me know if you want to be kept in the loop with that progress and I can let you know when it’s ready.

Christy

camerondow35 commented 4 years ago

To be more specific and document the problems i'm having with the met tower:

  1. We have two sensors - many times the two aren't even within 2 degrees C of each other, and i'm not sure which to trust. image

  2. Sometimes the two sensors are close, but off enough to cause doubt. Might be able to average the two together, or just commit to one? image

  3. Sometimes a sensor stops reading and leaves large blocks of NA's. Reading your suggestions above, it seems like pecan can help here. Usually the second sensor - which makes me wonder about its accuracy. image

  4. In 2019, sensor 1 is fried. In 2020 it's a bit better in spots but overall unusable. Screenshot of 2019 image

  5. There is often random spikes in one of the sensors, which is a bit difficult to capture with an automated quality check, but i'm still trying to work out a way to do it. Screenshot from 3/7/2011. Temps go from 12 to 31 to 5 in a matter of an hour (~1pm-2pm) according to sensor 1, while sensor 2 reports a 10 degree drop in the same period. image

camerondow35 commented 4 years ago

Have you heard of climpact? At first glance the quality control outputs seem useful.

camerondow35 commented 4 years ago

image

Here's an example of the type of plot climpact can produce. This is using max and min temps from sensor 1 of the met tower. Obvious problems with the data - i'll look into it further to understand exactly what's wrong. Other QC graphs can be found in this folder. Let me know if you think climpact is worth continuing with

camerondow35 commented 4 years ago

also produces a csv of outliers - seems handy

teixeirak commented 4 years ago

Thanks for working on this @camerondow35! It would actually be great if you could move this to the climate data portal, as I think that your codes/ figures/ learning experience will be helpful for others.

teixeirak commented 4 years ago

Does censor 2 stay good until present? If so, I'd just drop censor 1 completely. Looks like we need to replace it.

camerondow35 commented 4 years ago

Heres sensor 2: image

Better! but still some weird points. Where is the climate data portal? Do you mean into the climate github repo? I want to do some major cleaning and probably write up a 'user guide' for climpact before I move too much, if thats ok.

teixeirak commented 4 years ago

Yes, much better!

camerondow35 commented 4 years ago

Worth noting that climpact only ID's 1 point here as an outlier while I could argue for many more than 1 being outliers / incorrect

teixeirak commented 4 years ago

Here's the repo where we host those data: https://github.com/forestgeo/Climate/tree/master/Climate_Data/Met_Stations/SCBI/ForestGEO_met_station-SCBI. I'd put processed data and plots there.

It would be best to put the script here: https://github.com/forestgeo/Climate/tree/master/Climate_Data/Met_Stations/scripts, as it would be valuable for other ForestGEO met station data (including a few stations whose format exactly matches that of this station).

Committing to that repo will also give you credit in the citation for the repo.

teixeirak commented 3 years ago

We're now using the ForestGEO station data; closing this.