Closed MagicMilly closed 3 years ago
@MagicMilly this may be blocked by my related issue (which is in the DIAG org repo at the moment, but I'm not sure if it's actually a blocker for you yet). I don't know if I shared the notes from meeting with Dr. Stanish RE: the metagenomes with you. Let me know if you need to schedule a meeting next week to discuss some of this in more detail.
Do you think the prototype table for one trait would be blocked, or just the narrowing down bit? If the former, then yes I would like to discuss.
@MagicMilly I think that testing for a single trait would be fine. Narrowing the temporal scale will be dependent on whenever I can get the microbial data onto the new cluster.
Moving to next Sprint and may need to break up into smaller tickets for all the tasks / steps. One site file for a specific trait can be cleaned up to look like this, but additional data/metadata are needed to understand the table (e.g. the meaning of the endDate
column, which only contains eight unique values).
Slice of the dataset linked above
Still working on Step One after receiving feedback on initial table, in addition to possibly breaking up ticket into smaller tasks, so am bumping to the next Sprint.
This R package might simplify the workflow: https://cran.r-project.org/web/packages/neonstore/index.html
here is an example: https://github.com/eco4cast/neon4cast-neon-download/blob/master/download.R
Thank you very much - this looks extremely helpful! I'll start with it and ask Kristina for any R-related help.
Met with @KristinaRiemer today and have a much better understanding of the data in the file we want (the 1 square meter data with the percent cover and height observations). My initial prototype table was incorrect, so I'll be creating that now in pycharm as a script, working with one local file to start. Also working on converting notebooks to scripts as described in #109
@KristinaRiemer I created one output table in this repo, using one input csv in the data
folder. I've included all location, date, and plant data. I kept the heightPlantOver300cm
column because I've seen data in that column from other sites. No columns were renamed since we don't know when we'll take that step while combining all input data. I can add a script today, and we can chat about next steps for the next Sprint.
@MagicMilly the output table isn't actually in the repo, right? And it was generated by the .ipynb in the code folder? Though you just started working with scripts, it would be beneficial to eventually do all of your cleaning work in scripts from start to finish, I think. Let me know when the script is ready!
@MagicMilly I was able to run the script without an errors and the resulting data file was the same!
I was thinking for next steps that it might be useful to collaborators to have some summary stats about this data file? Maybe number of unique sites (from lat/lon), a genus/species list w/ numbers (the scientificName
column is a mess but the TaxonID
one might be useful), etc.
The heightPlantOver300cm
might not be that useful for plant height because that threshold is 9 feet, which is pretty tall for most plants, and none of the plants in this dataset exceed that.
Yay! That is a great idea for next steps, thank you. I'll write up a new ticket for you to review.
To clarify, the goal is to have a data table that is as close to the ones our collaborators have been using as possible
Also, please use the neonstore
R package to download the data, something like https://github.com/eco4cast/neon4cast-neon-download/blob/master/download.R#L18
@MagicMilly do you have a followup issue for @dlebauer's previous comment?
I don't think I do yet, but I'll create one today and tag you
Closing this issue; follow-up ticket #117 incorporates feedback on this ticket
Referring to this data product: https://data.neonscience.org/data-products/DP1.10058.001
plant presence
download)