genophenoenvo / docker

List of Docker images used in analyses
GNU General Public License v3.0
0 stars 0 forks source link

bnlearn data ingestion script data pulls #36

Closed rbartelme closed 3 years ago

rbartelme commented 4 years ago

~- [ ] Output combined tabular dataset with sites as feature~

rbartelme commented 4 years ago

Moving to Sprint 42

rbartelme commented 4 years ago

Final input datasets are being wrapped up this week, so I'm moving this to sprint 43.

rbartelme commented 4 years ago

Need to touch base with @MagicMilly regarding the weather data location and some trait names in the datasets.

MagicMilly commented 4 years ago

The weather data are on the google drive and CyVerse. What is the issue with the trait names?

rbartelme commented 4 years ago

@MagicMilly I haven't been able to find the weather data on CyVerse.

As far as the trait data in the tall_format_traits folder on CyVerse:

Other questions:

rbartelme commented 4 years ago

@MagicMilly The wget statements with the url's you provided worked really well.

I'm just debugging my code before I combine the datasets into one tabular dataset and rerun the network learning algorithm with some added features.

MagicMilly commented 4 years ago
MagicMilly commented 4 years ago

@rbartelme For the correct GDD, are you alright with weather datasets that are sliced for season dates only? Or do you need two datasets for each season - one for the whole year, and one for season dates (planting to harvest)?

rbartelme commented 4 years ago

@MagicMilly

dlebauer commented 4 years ago

Given RH and T, you can compute VPD thus: https://github.com/PecanProject/pecan/blob/8b42cba20ed2c0b6469dd9be4b7ba3e2318ee99a/modules/data.atmosphere/R/metutils.R#L56

es <-(6.11 * exp((2500000/461) * (1/273 - 1/(273 + temp))))
vpd <- (((100 - rh)/100) * es)

for mean VPD, use hourly data and then compute the mean.

Seems reasonable to exclude blocking heights, though for a general case i.e. other parameters available at a subset of heights, might be worth considering if it is possible to specify a hierarchical model with structure that varies by site. Or in the simple case, assigning a single unique integer per site where there is no blocking.

dlebauer commented 4 years ago

regarding "days to flowering" since both seasons 4 and 6 have panicle count, it could be worthwhile to compute another index of days_to_flowering_est which is days until panicle count = 1/2 stand count. Then plot days to flowering vs days_to_flowering_est and see how these line up. @rbartelme if you think it would be worthwhile to do this then please open another issue.

dlebauer commented 4 years ago

the difference between canopy height and plant height (I think) is that the canopy height does not include height to top of panicle. I propose that we treat them the same for now since

  1. these are measured at different sites and
  2. its not clear if canopy_height measured by the machine excludes panicles or not. I will ask
MagicMilly commented 3 years ago

CyVerse download links for updated weather data

MAC Season 4: https://de.cyverse.org/dl/d/E11D3666-CD04-426F-B833-85DB6B39C574/mac_season_4_weather.csv

MAC Season 6: https://de.cyverse.org/dl/d/33B533EC-9EB0-4BB4-AAA2-650FAD4BD1D5/mac_season_6_weather.csv

KSU: https://de.cyverse.org/dl/d/F9FE37D0-BF57-4238-9F61-71C1D34B0B18/ksu_weather.csv

Clemson: https://de.cyverse.org/dl/d/08675B05-F02E-4AB1-A934-8EFAD8DD3296/clemson_weather.csv

All of these data should now have vpd in KPa and gdd, in addition to existing parameters and any extras that may have been included in the raw data for each site. If you spot any errors or need additional information, please let me know in a new issue.

rbartelme commented 3 years ago

Created new issue #38 in Docker for continuing this.