Closed rbartelme closed 3 years ago
Moving to Sprint 42
Final input datasets are being wrapped up this week, so I'm moving this to sprint 43.
Need to touch base with @MagicMilly regarding the weather data location and some trait names in the datasets.
The weather data are on the google drive and CyVerse. What is the issue with the trait names?
@MagicMilly I haven't been able to find the weather data on CyVerse.
As far as the trait data in the tall_format_traits
folder on CyVerse:
Clemson has plant_height
instead of canopy_height
MAC_season_6 does not have gdd_to_flowering, but I noticed this was an older filename whereas the other datasets had newer names...not sure if that's an issue?
@MagicMilly The wget
statements with the url's you provided worked really well.
I'm just debugging my code before I combine the datasets into one tabular dataset and rerun the network learning algorithm with some added features.
/iplant/home/shared/genophenoenvo/data/weather_data
plant_height
and canopy_height
came from betydb, and I assume they're the same,
though @dlebauer can probably confirmgdd_to_
so without days to flowering, I did not update that season. They were all downloaded around the same time, though.experiments
. The Season 4 blocking patterns can be found here, and David provided me with a csv for Season 6.@rbartelme For the correct GDD, are you alright with weather datasets that are sliced for season dates only? Or do you need two datasets for each season - one for the whole year, and one for season dates (planting to harvest)?
@MagicMilly
Good to know about no flowering data for Season 6, I'll just drop gdd_to_flowering
as a feature.
Thanks for looking into the weather data! Whenever you get a chance could you please drop the URL's for those datasets here or on slack.
for the correct GDD, sliced for season dates only is perfect.
@dlebauer would know how to calculate VPD for KSU, if you could add that to the KSU dataset that would be outstanding. It's a really big factor for plant physiology and related to transpiration rates.
Thank you for the information regarding the blocking heights. I may ignore the blocking heights entirely, since I want to combine data from all the sites into a single input table for the network algorithm.
Given RH and T, you can compute VPD thus: https://github.com/PecanProject/pecan/blob/8b42cba20ed2c0b6469dd9be4b7ba3e2318ee99a/modules/data.atmosphere/R/metutils.R#L56
es <-(6.11 * exp((2500000/461) * (1/273 - 1/(273 + temp))))
vpd <- (((100 - rh)/100) * es)
for mean VPD, use hourly data and then compute the mean.
Seems reasonable to exclude blocking heights, though for a general case i.e. other parameters available at a subset of heights, might be worth considering if it is possible to specify a hierarchical model with structure that varies by site. Or in the simple case, assigning a single unique integer per site where there is no blocking.
regarding "days to flowering" since both seasons 4 and 6 have panicle count, it could be worthwhile to compute another index of days_to_flowering_est which is days until panicle count = 1/2 stand count
. Then plot days to flowering vs days_to_flowering_est and see how these line up. @rbartelme if you think it would be worthwhile to do this then please open another issue.
the difference between canopy height and plant height (I think) is that the canopy height does not include height to top of panicle. I propose that we treat them the same for now since
CyVerse download links for updated weather data
MAC Season 4: https://de.cyverse.org/dl/d/E11D3666-CD04-426F-B833-85DB6B39C574/mac_season_4_weather.csv
MAC Season 6: https://de.cyverse.org/dl/d/33B533EC-9EB0-4BB4-AAA2-650FAD4BD1D5/mac_season_6_weather.csv
KSU: https://de.cyverse.org/dl/d/F9FE37D0-BF57-4238-9F61-71C1D34B0B18/ksu_weather.csv
Clemson: https://de.cyverse.org/dl/d/08675B05-F02E-4AB1-A934-8EFAD8DD3296/clemson_weather.csv
All of these data should now have vpd
in KPa and gdd
, in addition to existing parameters and any extras that may have been included in the raw data for each site. If you spot any errors or need additional information, please let me know in a new issue.
Created new issue #38 in Docker for continuing this.
[x] Add wget calls to CyVerse URL's to pull CSV's into docker container environment
[x] Fix code to parse these pulled files as a list
[x] Begin Testing functionalized code vs. input file list
[x] Add sites as a feature
~- [ ] Output combined tabular dataset with sites as feature~