Open teixeirak opened 3 years ago
Ummmm I've never seen anything like that in the SRDB data 🤔 but 🤷♂️
I just checked and it was not there after we added SRDB. So, it presumably comes from GROA.
Phew :)
There appears to be some bad data from Taylor_2017_tari: very low precip for at least 4 sites in India (e.g., Kodayar I, IV...). Data came from here: https://data.nceas.ucsb.edu/view/knb.1274.1. @beckybanbury , do you have an inntermediary data file with this climate info?
I believe that strange line of data comes mostly or entirely from Xu_2015_proa, imported via GROA. I don't see an explanation in the paper as to climate data source, but study system includes 164 plots spanning elevation gradient, so presumably climate is extrapolated as a function of elevation. I'm not sure if this is the most accurate possible for that location, but at least we have an explanation. I haven't verified against original data yet, but it seems unlikely that this is an error in GROA or ForC.
Here's
MAP vs MAT for Xu_2015_proa sites:
@teixeirak yes, that's an error - the intermediate data sheet is in this folder - it's the litterfall data file, and by the looks of things I just accidentally copied across the values from the adjacent column.
I've corrected in ForC_sites
Are there any others that look out?
Many thanks, @beckybanbury , and sorry to ping you on a weekend. No need to respond right away. (I'm pushing to solve some problems for a deadline this week, but we can just avoid questionable records at this point.)
It's hard to say if everything looks right now. Taylor 2017 has a number of records with very high precip, and so I started trying to check some. I verified that one was correct (Swer) but found one error (Wooroonooran National Park Bellenden Ker). Then, I got caught up trying to understand what's going on with the La Fortuna Forest Reserve, which has 5 sites with identical coordinates but different climate entered, but only seems to have one site when you go to the original pub. That will need to be solved, but I have to drop it for now.
I have reviewed the most egregious outliers. However, there are almost certainly some errors. It would be good to check the ForC climate data against a global database to identify values that are way off (e.g., units error during data entry).
@teixeirak happy to help with this if you'd like, particularly the data from Taylor 2017 that I entered - I remember reviewing some of the C flux values that looked off at the time, but didn't check climate data so closely. Happy to spend some time reviewing if you'd like - just let me know how you want to approach this!
Thanks, @beckybanbury. I sent an email about this. More narrowly, figuring out what's going on with La Fortuna (see here) would be helpful.
@beckybanbury , thanks for working on this!
Based on the plots, let's flag sites "climate.data.suspect" if any of the following are true:
You could just flag with a "1", or better yet list the variable(s) that is/are off.
@beckybanbury , if you're able to complete the step above this week while @Troger4 is still with us, she could check the climate values that are way off.
@teixeirak sorry - somehow I missed your previous comment! I've flagged with the name of the variable that is suspect. It doesn't look like there's too many.
@Troger4 , please use the climate.data.suspect
field in this file to identify the sites with suspicious climate data. It is coded to indicate which value is bad. When one value is bad, but please double check the others. In case the original pub does not report climate data, please replace the bad value with "NI".
Okay, I see there are 284 climate.data.suspect entries with MAP, MAT, min temp, and max temp. What do MAP and MAT represent in columns R and O? Thank you
Not sure what file you are working with exactly but it must be Mean Annual Precipitation and Mean Annual Temperature.
Metadata for the SITES table is here: https://github.com/forc-db/ForC/blob/master/metadata/sites_metadata.csv
Hi Valentine, I'm looking in ForC_sites_climate_data within extracted_sites_data, mean annual precip and mean annual temp makes sense. Thanks very much!
Those correspond to columns in ForC_sites, and indicate which have large deviations from the value pulled form the global database (WorldCLim). Be sure to put fixes in ForC_sites (the msater), not in extracted_sites_data.
Also note: this file and the master ForC_sites DO NOT MATCH because sites missing coordinates are not included in the former.
Also, please create a new column in this file to note when you've reviewed the climate data.
Which file is the "this file" you referred to? And which should I be looking in to find climate.data.suspect records? Thank you!
Sorry, I guess the links were confusing. Here it is with the file names:
Also note: [this file] (https://github.com/forc-db/ForC/blob/master/data/extracted_site_data/ForC_sites_climate_data.csv) and the master ForC_sites DO NOT MATCH because sites missing coordinates are not included in the former.
Also, please create a new column in [this file] (https://github.com/forc-db/ForC/blob/master/data/extracted_site_data/ForC_sites_climate_data.csv) to note when you've reviewed the climate data.
@mawilliams99 , this is an issue that you can get started on as an intro to the ForC data work.
There's a lot of discussion above, but summarizing here--
sites.csv
(https://github.com/forc-db/ForC/blob/master/data/ForC_sites.csv). The metadata files explaining each field in the SITES table is here: https://github.com/forc-db/ForC/blob/master/metadata/sites_metadata.csv.climate.data.suspect
field in this file (https://github.com/forc-db/ForC/blob/master/data/extracted_site_data/ForC_sites_climate_data.csv). It is coded to indicate which value is bad. When one value is bad, but please double check the others. In case the original pub does not report climate data, please replace the bad value with "NI" (missing value code for "no information").I'll message you separately to make sure this makes sense.
@mawilliams99 or @ValentineHerr , this should be a quick task-- could one of you please merge the field climate.data.suspect
field in this file (https://github.com/forc-db/ForC/blob/master/data/extracted_site_data/ForC_sites_climate_data.csv) into the corresponding field in the master sites file? (The difference between the two is that the master includes a few sites with no coordinates.) The motivation for this is that reviewing the climate data doesn't have to happen with high priority, but we want to be sure to review any suspect data on sites we may send to EFDB.
I'll work on this now
Looks like Yenisei 2lu
and Yenisei 26lh/lw
are missing from the file (beside sites without latitudes).
I'll go ahead and merge anyways as I believe that you (@teixeirak) have been working with this site recently.
Thanks! This is very helpful.
Those two sites would be very similar to the other Yenisei sites, which don't have suspect data, so this is fine.
@mawilliams99 , please be sure to check the climate.data.suspect
field in sites for all the studies that you review. We should avoid sending any of those values to EFDB (better to replace with NA, unless there's some really good reason to believe that the current data are correct (e.g., steep topographic gradients that would make the site quite different from most of the surrounding areas)
Our latest plot of the climate data has some obviously wrong values, including a strange line of data and a handful that are out of range. I assume these came in with GROA or maybe SRDB (I think it's been a long time since I've looked at this figure).