genophenoenvo / terraref-datasets

Repository for code and small datasets derived from the TERRA REF program
MIT License
0 stars 3 forks source link

Season 4 drought treatment data #87

Closed rbartelme closed 4 years ago

rbartelme commented 4 years ago

Jupyter cleanup notebeook output from 05-20-2020 from Season 4 cleanup script contains the following labels in treatments column:

# A tibble: 3 x 2
  treatment                                     n
  <fct>                                     <int>
1 NA                                       326890
2 BAP 2017, water-deficit stress Aug 1-14   53146
3 BAP 2017, water-deficit stress Aug 15-30  17843

The treatments should be a block, i.e. all cultivars that undergo water stress should be labeled. As far as @dlebauer had said, this meant that 50% of the field did not have water from August 1-14, and then was irrigated normally Aug 15-end of season. The other half had irrigation shutoff from Aug 15-30, and was irrigated normally until the end of season. This should be a fixed value by gps coordinates.


Check Data Provenance

dlebauer commented 4 years ago

All records from this experiment have one of the two treatments in the database. It is not clear at this point why there are so many more measurements for one treatment compared to the other. This warrants further investigation...

select treatments.name, count ( * ) as n 
from treatments join traits on treatments.id = traits.treatment_id 
where
    extract ( year from date ) = 2017 
    and extract ( month from date ) between 4 and 10 
    and checked > - 1 
group by treatments.name;
name n
MAC Season 4: BAP water-deficit stress Aug 15-30 152403
MAC Season 4: BAP water-deficit stress Aug 1-14 222747

~It is not clear at this point why there are so many more measurements for one treatment compared to the other. This warrants further investigation...~ I've had to update the site / treatment relationships in the database... https://gist.github.com/dlebauer/4c87fd99cda07ed8c28efb26ce6e287c

MagicMilly commented 4 years ago

After speaking with @dlebauer, rather than updating all of the season 4 datasets, I added two boolean columns to the weather data for first and second water deficit treatments. Have not committed the updated season 4 cleaning notebook, but the season 4 weather dataset can be found on GitHub as well as Google Drive. The weather data can then be joined to other derived data as needed for analysis.