genophenoenvo / terraref-datasets

Repository for code and small datasets derived from the TERRA REF program
MIT License
0 stars 3 forks source link

Create tall format derived datasets from all seasons #100

Closed MagicMilly closed 4 years ago

MagicMilly commented 4 years ago

As requested by Ryan, retain tall format from traits data queried from betydb, but apply same cleaning functions as the wide derived datasets. Will be easier for him to work with using R for machine learning.

rbartelme commented 4 years ago

@MagicMilly could you please clarify what the wide derived datasets are? Are these the metrics calculated in your notebooks like days to flowering, flag leaf emergence, gdd, etc?

MagicMilly commented 4 years ago

Yes, any of the single traits like the ones you listed are wide, with one row per plot. Canopy height is a bit different, but when I originally queried in R using the traits package, it was tall format with all traits.

rbartelme commented 4 years ago

@MagicMilly Thanks for the clarification!!

dlebauer commented 4 years ago

here is the code to take the tall format trait_data.zip from dryad and put into a single tall file

https://github.com/terraref/data-publication/blob/5bef8f4b8b834c00cb24a0d96f7562976a9535bd/content/phenotypes.Rmd#L24

MagicMilly commented 4 years ago

Tall format tables will be uploaded to this Google Drive folder for feedback until they're ready to be shared on CyVerse. Please let me know if you cannot access.

MagicMilly commented 4 years ago

@rbartelme Just an update: all of the raw, tall datasets for four seasons (including MAC 4 and 6 with the additional info like lat and lon) are now in the same folder that I shared above.

MagicMilly commented 4 years ago

@rbartelme The raw tall formats in the tall_format_data folder have been updated with

I matched the new data with the original where I could, but I did not modify any of the date values. Hopefully that can be more easily done in R, though it may not be needed for this particular trait.

The code I used can be found on github in the create_tall_formats notebook. Let me know if you notice any errors or have any other feedback.