forestgeo / Climate

Climate data for ForestGEO sites
https://forestgeo.github.io/Climate/
Creative Commons Attribution 4.0 International
7 stars 9 forks source link

CRU annual data summaries #39

Closed teixeirak closed 4 years ago

teixeirak commented 4 years ago

@biancaglez , could you please write a script to generate monthly and annual summaries for the CRU data? (I assume this should be easy). In particular, I'd like Jan and July T, annual precip (for MEE paper Table 1), and mean annual temperature (might be needed for another paper). We'll want this for all the ForestGEO sites.

Let's go with a time range of 1950-present, but keep it easily adjustable in the code.

Note that for the annual summaries, some variables should be summed across months and others averaged.

Average: TMP, TMN, TMX, CLD

Sum: PRE, WET, FRS,

Special: PET - convert daily average to monthly sum (mm/mo), then sum across months for units of mm/yr.

biancaglez commented 4 years ago

I'd like Jan and July T, annual precip (for MEE paper Table 1), and mean annual temperature (might be needed for another paper).

Just clarifying, do you mean January temp and July temp mean summaries of CRU vars? This would be covered in the monghtly summaries you're requesting, right? && for annual precip for MEE paper -- do you just mean the average here? Confused by this line ...

Jan and July T, annual precip (for MEE paper Table 1)

but generally, I think I just need to get annual and monthly summary tables (mean, mode, max, min), correct? Or would you only like means?

teixeirak commented 4 years ago

Just clarifying, do you mean January temp and July temp mean summaries of CRU vars? This would be covered in the monghtly summaries you're requesting, right?

Yes.

teixeirak commented 4 years ago

&& for annual precip for MEE paper -- do you just mean the average here? Confused by this line ...

Yes, mean annual. You have to sum all the months per year, then average.

teixeirak commented 4 years ago

but generally, I think I just need to get annual and monthly summary tables (mean, mode, max, min), correct? Or would you only like means?

That would be perfect (and the more stats the better)-- maybe 5th and 95th percentiles, too.

biancaglez commented 4 years ago

Hi @teixeirak !!

So for the sake of coding simplicity, I calculated the sum, average, min, max, median, mean, sd for all variables. Since you were most interested in the SUM (for PRE, WET, FRS) and the MEAN (for TMP, TMN, TMX, CLD) - I put SUM and MEAN at the beginning of the CSVs. Find those CSVs -- mean and annual stats here. -- so there will be columns like PRE_mean that you can ignore (the script generates all stats)

---> still working on the special PET case and will get that up tomorrow (as well as 5th and 95th percentiles!)

biancaglez commented 4 years ago

Special: PET - convert daily average to monthly sum (mm/mo), then sum across months for units of mm/yr.

Morning, Krista! I'm a little confused with the above special instructions for computing PET. When I read in CRU data - I don't get a daily average (I get a single value -- which I guess I could assume is inherently the average) but I don't understand how to convert this to a monthly sum (because there is only a single value I can add in say January of 1950) --

So, wouldn't this just be the sum across months for units of mm/yr?

Screen Shot 2020-09-23 at 7 55 14 AM

teixeirak commented 4 years ago

@biancaglez , hold on... we have a script to create PET sums. @ValentineHerr wrote it. Hang on while I find it.

teixeirak commented 4 years ago

Okay, we just need to use the PET_sum (mm/mo) variable here, as opposed to PET (mm/day).

teixeirak commented 4 years ago

So for the sake of coding simplicity, I calculated the sum, average, min, max, median, mean, sd for all variables. Since you were most interested in the SUM (for PRE, WET, FRS) and the MEAN (for TMP, TMN, TMX, CLD) - I put SUM and MEAN at the beginning of the CSVs. Find those CSVs -- mean and annual stats here. -- so there will be columns like PRE_mean that you can ignore (the script generates all stats)

I see the script by not the .csvs. I do think it would be helpful to post the .csvs so that users don't have to run the script. Also, if its not too much of a pain, please output just the variables of interest. Something like an annual temperature sum is meaningless and would just confuse people (plus, presenting it would look a bit unprofessional).

biancaglez commented 4 years ago

annual_stats.csv and monthly_stats.csv should be here -- BUT yes, you're right, I will clean it up a bit and select the variables of interest.

And good note on the pet_sum 👍

biancaglez commented 4 years ago

Finished this ... here is the annual_stats.csv && the monthly_stats.csv along with a script monthly_annual_CRUsummaries.R

teixeirak commented 4 years ago

Thanks @biancaglez! But I'm afraid my directions weren't clear. We want to average over the whole 1950-present time period, so we'd expect just one mean value per site (for each variable/ month ). Does that make sense?

biancaglez commented 4 years ago

Maybe. Do you mean TMP_Mean for the time period from 1950-2019 was say 27

If you give an example that would be better

teixeirak commented 4 years ago

Amacayacu mean temp would be for 1950-2019 would be 25.9.

Output file should have only one row per site, and the min/max/st would be calculated across years (not months within a year).

Does that help?

biancaglez commented 4 years ago

That does make sense. So your example above would cover the annual summaries...

How about monthly summaries? Would the example then be: Amacayacu mean temp for 1950-2019 in JAN is 22.1.

So 12 values per site for the 1950-2019 time range?

teixeirak commented 4 years ago

Correct.

biancaglez commented 4 years ago

Okay, I think I'm done. Let me know if its' what you're looking for. I left the code that generated those other CSVs in there too, just in case it's needed later.

For now, here is the folder with it all.

teixeirak commented 4 years ago

Awesome, thanks @biancaglez ! Except that the names were switched. I renamed them, and also made the names more descriptive.

It would be easier to work with if were to add a column with the variable name and then present mean, min, std, etc. as columns (so each site has a row for each variable). But this works, and will be very helpful!

teixeirak commented 4 years ago

When you get a chance, could you please add a description of the script and results files to the Readme? Or maybe @forestgeoadm can help?

biancaglez commented 4 years ago

Done! added a sentence - you can chck it out here.

Thanks!

forestgeoadm commented 4 years ago

Hi @teixeirak,

I could have sworn that you asked me a week or so ago to create a README about documentation (a README for creating READMEs?), but now that I have a moment to look at it more closely I can't find your e-mail/GitHub issue/whatever means of communicating your request that you used. Would you let me know if this is something that you want me to do?

Take care, Caly

teixeirak commented 4 years ago

@biancaglez , sorry to only be pointing this out now, but I just looked at these files and noticed that the variables that are summed appear to be summed, as opposed to averaged, across years. What we want is to sum the months (e.g., to get total annual precipitation) and then average across years. So, in the image below, precip for Amacayacu should be 189442/ (2019-1950+1)= 1278. Is this something that you can easily fix?

image
teixeirak commented 4 years ago

I put a warning message on the READMEs. This needs to be deleted once fixed.

biancaglez commented 4 years ago

I'm not sure. I think it's something Cam could easily fix. I'm quite busy already and a little overwhelmed at my new job. Perhaps we can ask him and if he can quickly do it, then great. If not, I can look this weekend.

@camerondow35 @teixeirak

biancaglez commented 4 years ago

I think I forgot to add a mean_precip column ---

camerondow35 commented 4 years ago

@teixeirak It looks to me like the pre_sum column is the total precip over the month*year period. Did you want that column removed and replaced with an average? Or keep total while adding average column?

biancaglez commented 4 years ago

por que no los dos? @camerondow35 easy fix?

camerondow35 commented 4 years ago

@biancaglez Yes, easy fix either way. I got it

biancaglez commented 4 years ago

THANK YOU @camerondow35 -- I believe you are an angel sent from heaven.

teixeirak commented 4 years ago

Thank you both!

What we want is two steps: 1- sum all months within each year 2- average across years

camerondow35 commented 4 years ago

Pushed the fix to this - edited your script some @biancaglez , seemed like it was slightly out of order and most of the code in the second half wasn't used? Please double check the csv's to make sure I didn't leave any thing out

teixeirak commented 4 years ago

Thanks so much, @camerondow35 ! Looks good.

teixeirak commented 4 years ago

Oh wait, sorry... spoke a bit too fast. The pre_mean column is approximately correct, but I think it may be wrong. pre_sum should be deleted. For other variables, it looks like all records ending with _sum are incorrect. We never* want to sum across years. The variables that I indicated should be summed need to be summed across months (first) and then averaged across years (second).

*The current pre_mean = sum of pre_mean for individual months. However, sum of mean ≠ mean of sums. We first need to sum Jan-Dec precip for each year to get annual precip, then average all the years to get mean annual precip.

Does this make sense?

camerondow35 commented 4 years ago

Maybe..For the _sum columns what you're looking for is an average monthly number across the 1950-2019 period? So add up each monthly measurement and average over 840 months?

For pre_mean, I thought thats what I did...I have yearly averages for each site, then I average over the 1950-2019 period.

I re-ran a different way and got the same values for pre_mean.

teixeirak commented 4 years ago

Let's start with the monthly stats. There, for each month, you'd just average across the 1950-2019 period. So,

pre_mean for January = mean (pre_jan1950, pre_jan1951, pre_jan 1952, etc).

It's the same math for all variables. Do we have that part right?

teixeirak commented 4 years ago

generalized,

mean = sum (x_1950, x_1951, x_1952, .... x_2019)/ 70,

min = min (x_1950, x_1951, x_1952, .... x_2019),

SD = SD (x_1950, x_1951, x_1952, .... x_2019),

etc.

Here, x is climate variable of interest, 70 is the n years included in the calculation.

This applies to both monthly and annual stats.

teixeirak commented 4 years ago

For monthly stats, x is simply the raw data values.

For annual stats, we first have to compute x.

For variables that need to be summed (e.g., pre), x_annual = sum (x_Jan, x_Feb, ... x_Dec)

For variables that need to be averaged (e.g., temperature), x_annual = sum(x_Jan, x_Feb, ... x_Dec)/12

camerondow35 commented 4 years ago

image

Just for clairty:

First, calculate annual totals , then average those totals together over the entire period.

First, calculate annual averages, then average those averages over the entire period.

Edit* SO here cld_sum_mean is incorrect, it should be cld_mean_mean. Fixed in latest version

teixeirak commented 4 years ago

This is all correct.

teixeirak commented 4 years ago

Okay, and the files produced also look good. I will delete the old files.

teixeirak commented 4 years ago

I believe this can be closed (hopefully permanently this time!). Many thanks, @camerondow35!

teixeirak commented 4 years ago

@camerondow35 , it looks like you forgot to push your fixes to the script. Could you please do so?

camerondow35 commented 4 years ago

Ah, sorry. Pushed now. @teixeirak