Closed rbartelme closed 4 years ago
All records from this experiment have one of the two treatments in the database. It is not clear at this point why there are so many more measurements for one treatment compared to the other. This warrants further investigation...
select treatments.name, count ( * ) as n
from treatments join traits on treatments.id = traits.treatment_id
where
extract ( year from date ) = 2017
and extract ( month from date ) between 4 and 10
and checked > - 1
group by treatments.name;
name | n |
---|---|
MAC Season 4: BAP water-deficit stress Aug 15-30 | 152403 |
MAC Season 4: BAP water-deficit stress Aug 1-14 | 222747 |
~It is not clear at this point why there are so many more measurements for one treatment compared to the other. This warrants further investigation...~ I've had to update the site / treatment relationships in the database... https://gist.github.com/dlebauer/4c87fd99cda07ed8c28efb26ce6e287c
After speaking with @dlebauer, rather than updating all of the season 4 datasets, I added two boolean columns to the weather data for first and second water deficit treatments. Have not committed the updated season 4 cleaning notebook, but the season 4 weather dataset can be found on GitHub as well as Google Drive. The weather data can then be joined to other derived data as needed for analysis.
Jupyter cleanup notebeook output from 05-20-2020 from Season 4 cleanup script contains the following labels in treatments column:
The treatments should be a block, i.e. all cultivars that undergo water stress should be labeled. As far as @dlebauer had said, this meant that 50% of the field did not have water from August 1-14, and then was irrigated normally Aug 15-end of season. The other half had irrigation shutoff from Aug 15-30, and was irrigated normally until the end of season. This should be a fixed value by gps coordinates.
Check Data Provenance
TRUE
orFALSE