genophenoenvo / terraref-datasets

Repository for code and small datasets derived from the TERRA REF program
MIT License
0 stars 3 forks source link

Check KSU days to flowering dataset #129

Open dlebauer opened 3 years ago

dlebauer commented 3 years ago

And check other KSU data

#download.file('https://de.cyverse.org/dl/d/88020A6C-6430-4B1E-BFDC-69C42E7E335C/ksu_days_gdd_to_flowering.csv', destfile = 'ksu_days_gdd_to_flowering.csv')
# download.file('https://github.com/genophenoenvo/terraref-datasets/files/5823480/ksu_days_gdd_to_flowering_v2.csv.zip', 'ksu_days_gdd_to_flowering_v2.csv.zip')
# unzip("ksu_days_gdd_to_flowering_v2.csv.zip")

library(tidyverse)
x <- read_csv('ksu_days_gdd_to_flowering.csv')
aov.out <- aov(days_to_flowering ~ cultivar + Error(cultivar/sitename), data=x)
summary(aov.out)

x2 <- read_csv('ksu_days_gdd_to_flowering_v2.csv') %>% 
  select(sitename, cultivar, days_to_flowering = mean)

x %>% count(cultivar) %>% count(n)
# nn is the number of cultivars with n replicate plots. 
# e.g. first line says there are 70 plots w/ 1 replicate 
#      n    nn
#      1    70
#      2    41
#      3     4

x2 %>% count(cultivar) %>% count(n)
#       n    nn
#      1    41
#      2   124
#      3     3
#      4    10
MagicMilly commented 3 years ago

@dlebauer As far as I can tell, the discrepancies are coming from different data sources. I used data downloaded from betydb with the traits package, V1. Some of the sitenames and cultivars from the data you just queried did not have flowering_time values in the data I used.

The path to the KSU data I used on CyVerse is /iplant/home/shared/genophenoenvo/data/raw/ksu_data_2020-06-11.csv.

The path to the KSU data cleaning notebook is /iplant/home/shared/genophenoenvo/updated_notebooks/ksu_data_cleaning.ipynb.

I hope this answers your question. If not, please let me know.