genophenoenvo / terraref-datasets

Repository for code and small datasets derived from the TERRA REF program
MIT License
0 stars 3 forks source link

fix tall trait data date format and populate NA weather data dates #106

Closed rbartelme closed 3 years ago

rbartelme commented 4 years ago

Problem Scope:

After running the weather data the bnlearn ingestion script and generating the combined trait+weather tsv file here, I noticed two things:

  1. Date values in tall trait datasets (YYYY Month_abbreviation dd) are a different format than the dates in the weather data (yyyy-mm-dd)

  2. Date ranges are slightly off between trait and weather data (or at least that's what's happening on my end with tidyverse+lubridate


Examples from my data

date location
2018-04-19 mac
2018-04-22 mac
2018-04-23 mac
2014-10-16 clemson

This looks like mac_season_6 and when I look for the minimum date in the data I have, I get the following: > min(raw_weather_data$mac_season_6_weather$date) [1] "2018-04-25"

When I find the maximum date on the Clemson dataset, I get the following: > max(raw_weather_data$clemson_weather$date) [1] "2014-10-15"


Task Abstraction:

Make sure we can interoperate the datasets using the same date format.


Trait data

Upload updated trait datasets to CyVerse


Weather Data

MagicMilly commented 4 years ago
rbartelme commented 4 years ago

@MagicMilly Thanks for looking into this, this indeed what I had in mind. It does seem like a database error maybe something that needs to be addressed in BetyDB?? (@dlebauer ??)

dlebauer commented 4 years ago

What are the actual values of canopy cover prior to planting? if they are ~0 then that is sensible

MagicMilly commented 4 years ago

@dlebauer The canopy height values in cm are in the column mean, so they are shown in my screenshot above as being 9, 10, 11, 12, 13 cm spread across 84 observations before the planting date.

Chris-Schnaufer commented 3 years ago

@MagicMilly What's the status of this? Can it be closed?

MagicMilly commented 3 years ago

I'm not sure about the status - @dlebauer discovered some errors in the database, so those will probably need to be resolved as a follow-up ticket. I don't believe @rbartelme will be working on this issue further.

dlebauer commented 3 years ago

This looks like an artifact of the algorithm used to measure canopy height. I have marked suspect data as incorrect in the database; please remove all Season 6 canopy height measurements collected before May 15.

I've created a new issue for this: https://github.com/genophenoenvo/terraref-datasets/issues/123

Chris-Schnaufer commented 3 years ago

@dlebauer Would you like the work per your last comment to be a separate issue, or continue on this issue? https://github.com/genophenoenvo/terraref-datasets/issues/106#issuecomment-724244443

dlebauer commented 3 years ago

closing; follow up in https://github.com/genophenoenvo/terraref-datasets/issues/123