Closed rbartelme closed 3 years ago
mean
as the measurement in cm), so that seems like an error in the database.
@MagicMilly Thanks for looking into this, this indeed what I had in mind. It does seem like a database error maybe something that needs to be addressed in BetyDB?? (@dlebauer ??)
What are the actual values of canopy cover prior to planting? if they are ~0 then that is sensible
@dlebauer The canopy height values in cm
are in the column mean
, so they are shown in my screenshot above as being 9, 10, 11, 12, 13 cm spread across 84 observations before the planting date.
@MagicMilly What's the status of this? Can it be closed?
I'm not sure about the status - @dlebauer discovered some errors in the database, so those will probably need to be resolved as a follow-up ticket. I don't believe @rbartelme will be working on this issue further.
This looks like an artifact of the algorithm used to measure canopy height. I have marked suspect data as incorrect in the database; please remove all Season 6 canopy height measurements collected before May 15.
I've created a new issue for this: https://github.com/genophenoenvo/terraref-datasets/issues/123
@dlebauer Would you like the work per your last comment to be a separate issue, or continue on this issue? https://github.com/genophenoenvo/terraref-datasets/issues/106#issuecomment-724244443
closing; follow up in https://github.com/genophenoenvo/terraref-datasets/issues/123
Problem Scope:
After running the weather data the bnlearn ingestion script and generating the combined trait+weather tsv file here, I noticed two things:
Date values in tall trait datasets (
YYYY Month_abbreviation dd
) are a different format than the dates in the weather data (yyyy-mm-dd
)Date ranges are slightly off between trait and weather data (or at least that's what's happening on my end with
tidyverse
+lubridate
Examples from my data
This looks like
mac_season_6
and when I look for the minimum date in the data I have, I get the following:> min(raw_weather_data$mac_season_6_weather$date)
[1] "2018-04-25"
When I find the maximum date on the Clemson dataset, I get the following:
> max(raw_weather_data$clemson_weather$date)
[1] "2014-10-15"
Task Abstraction:
Make sure we can interoperate the datasets using the same date format.
Trait data
Upload updated trait datasets to CyVerse
[ ] mac_season_4_tall
[ ] mac_season_6_tall
[ ] ksu_tall
[ ] clemson_tall
[ ] Please post updated tall trait dataset CyVerse urls as a comment here
Weather Data
mac_season_6
andclemson
for the dates listed in Examples from my data