Closed atkinsjeff closed 3 years ago
Should we embed measurement units in all column names? Note from Jeff, I want consistency, but could be unwieldy.
Definitely not; ugly, cumbersome, and often ambiguous. Metadata define the units for all columns.
@atkinsjeff Sorry just catching up with this. Please keep in mind that many of us get LOTS of GitHub email...it's not hard to miss something, like e.g. being tagged just once. I will work through items above as I can in next few days.
no worries @bpbond hope i didn't come across as pushy!
* fd_leaf_spectrometry: Reviewer comments: "Was this dataset exported as only the head of the data? Only 8 rows of data import using the R package when 7,155 are expected. There is an additional column “tree_id” that is not in the format of USDA PLANTS species codes and not defined under Table S6 in the SI." @lisahaber @atkinsjef
"Tree_id" column in this data set is not supposed to be a species code (that is a separate column called "species"). The information currently in this column is also not actual tree ID for these individuals, however. That would come from a lookup table with the IDs which were appended in 2019.
@lisahaber I fixed the issue with the loss of rows, but what do how should we proceed to address the ID issue?
I can send you the correct IDs, 2018 leaf IDs mapped to 2019 (i.e. the permanent) tree IDs.
A related issue is this: For the subcanopy data (not currently in the fd_leaf_spectrometry() data set, but eventually will be included) there is no leaf ID, because I only sampled once per tree. For these canopy trees, though, I think we need another column ("leaf_id") because I measured 3 leaves per tree.
Ok @atkinsjeff here's the look-up table for correcting the tree_id column in the fd_leaf_spectrometry() data. Note that what I'm listing as "leaf_id" in this table is what we currently have as "tree_id" in fd_leaf_spectrometry(). You need BOTH the subplot and the leaf_id columns to match the trees to their correct tree_id.
Whoops...here it is. Let me know if you have questions or trouble. Canopy_tree_lookup_table.xlsx
fd_metadata(table = “fd_inventory”) returns the whole metadata tibble.
This should be fixed in #60. There's also a bit more cleanup in there.
Ok @atkinsjeff here's the look-up table for correcting the tree_id column in the fd_leaf_spectrometry() data. Note that what I'm listing as "leaf_id" in this table is what we currently have as "tree_id" in fd_leaf_spectrometry(). You need BOTH the subplot and the leaf_id columns to match the trees to their correct tree_id.
@lisahaber can we just change "tree_id" to "leaf_id" then? i am confused
@atkinsjeff unfortunately, no, because there are three leaves per tree. I am sorry this is so confusing. Did not think through naming carefully enough in 2018.
@atkinsjeff, I'd favor a re-ordering of the vignettes, starting with "fortedata: Proposal Narrative", then "fortedata: Experimental Design and Treatment", with dataset vignettes following.
Moving this thread to Git for others to see, regarding R2's data visualization comment. @atkinsjeff, @bpbond, I'm wondering whether a common template (or 2) for data/dataset visualization could/should be coded into the package for dataset-focused vignettes. Here's what I posted to Slack:
Does anyone have thoughts with respect to a useful template or format for standardized data visualization across datasets? Would it make sense to generate two figures for each dataset vignette? E.g., one describing data availability, see: https://data.neonscience.org/data-products/explore. This would seem to address R2’s criticism. And, maybe another displaying the treatment means (no stats) of a response parameter within the dataset – a teaser of sorts. This would serve more as a useful illustration for orienting outside users.
@atkinsjeff, litter data vignette: Doesn't appear to conform to the same structure as other vignettes and is without methods. Remote sensing vignette methodological details differ among subsections and, unlike some of the other dataset vignettes, provide less detail on sampling distribution.
Does anyone have thoughts with respect to a useful template or format for standardized data visualization across datasets?
First, I think having a general template is an excellent idea, as it makes life easier for both vignette creators and package users. (A few, like the proposal vignette, won't conform to this of course.)
I like Chris's starting thoughts above about what the template actually should be. Building on that a bit:
fortedata
that deal with it...or something like that?
Note: forest inventory vignette includes a static number ("There are 3165 observations in the dataset") that should be calculated from the data.
That seems reasonable. I put the static number in there because I didn't figure we were adding more trees :)
I will make it dynamic.
I did some work on the remote sensing vignette this morning: https://fortexperiment.github.io/fortedata/articles/fd_remote_sensing_vignette.html
There are some issues on the observations waffle plot after pkgdown, but outside of that everything works fine. Still needs some work but @cmgough is this closer to what you envisage?
file size appears to an issue. our largest files are far and away vignette files. can these be compressed further or left out for CRAN? @bpbond @stephpenn1 @atkinsjeff
Well, why are they large? I assume because of graphics files. You should be able to experiment with reducing file sizes and/or resolutions to make things smaller.
add code to change FAGRE to FAGR
Why would we do this in code? Why not just change the data files?
Agreed. It will be a balance act I think to thread the needle between lots of documentation and illustration and file size.
Jeff
Jeff Atkins, Ph.D Department of Biology Virginia Commonwealth University
On Mon, Dec 7, 2020, 07:10 Ben Bond-Lamberty notifications@github.com wrote:
file size appears to an issue. our largest files are far and away vignette files. can these be compressed further or left out for CRAN? @bpbond https://github.com/bpbond @stephpenn1 https://github.com/stephpenn1 @atkinsjeff https://github.com/atkinsjeff
Well, why are they large? I assume because of graphics files. You should be able to experiment with reducing file sizes and/or resolutions to make things smaller.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FoRTExperiment/fortedata/issues/56#issuecomment-739878837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7XVVOKH3YUGLOKLWBZQL3STTAZZANCNFSM4UHX3DVA .
Ultimately I did change the errors in data.
Jeff Atkins, Ph.D Department of Biology Virginia Commonwealth University
On Mon, Dec 7, 2020, 07:11 Ben Bond-Lamberty notifications@github.com wrote:
add code to change FAGRE to FAGR
Why would we do this in code? Why not just change the data files?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FoRTExperiment/fortedata/issues/56#issuecomment-739879634, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7XVVNTTHHBHHARXP5MFNDSTTA67ANCNFSM4UHX3DVA .
I did some work on the remote sensing vignette this morning: https://fortexperiment.github.io/fortedata/articles/fd_remote_sensing_vignette.html
There are some issues on the observations waffle plot after pkgdown, but outside of that everything works fine. Still needs some work but @cmgough is this closer to what you envisage?
Following Ben’s suggestion, which I like, the vignette lacks: 1) an intro/broad subject of the vignette needed to orient the user and, 2) importantly, linked references.
Some of the methods lack essential details on sampling intensity and location.
I’d also clearly label the “Methods description” section with a heading.
@atkinsjeff, are you building the figure templates (1—data availability & 2—treatment means) or is that assigned to someone else?
the figures are not currently showing up via pkgdown, but are in the in-package vignettes. I was going with simple boxplots for now. I am adding the references now. I was just trying to draft up what I could then. I will expand the intro and touch base with others about expanding theirs as well.
On Mon, Dec 7, 2020 at 7:55 AM Chris Gough notifications@github.com wrote:
I did some work on the remote sensing vignette this morning: https://fortexperiment.github.io/fortedata/articles/fd_remote_sensing_vignette.html
There are some issues on the observations waffle plot after pkgdown, but outside of that everything works fine. Still needs some work but @cmgough https://github.com/cmgough is this closer to what you envisage?
Following Ben’s suggestion, which I like, the vignette lacks: 1) an intro/broad subject of the vignette needed to orient the user and, 2) importantly, linked references.
Some of the methods lack essential details on sampling intensity and location.
I’d also clearly label the “Methods description” section with a heading.
@atkinsjeff https://github.com/atkinsjeff, are you building the figure templates (1—data availability & 2—treatment means) or is that assigned to someone else?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FoRTExperiment/fortedata/issues/56#issuecomment-739900757, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7XVVLRQ6FM7FRVQ42R6KLSTTGDHANCNFSM4UHX3DVA .
-- Jeff Atkins, PhD Post-Doctoral Fellow Department of Biology Virginia Commonwealth University atkinsjeff.github.io he/his/him
Hey folks, catching up here -- I'm still happy to fix up the Ecophysiology vignette and include standard figures Chris mentioned. Probably best to wait for a working example to emulate, though, so I'll stay tuned for that. The methods and data description in this vignette seem thorough to me but happy to expand/modify if anyone thinks we should.
Working on this. Things are :/ in general so far.
Jeff Atkins, Ph.D Department of Biology Virginia Commonwealth University
On Mon, Dec 7, 2020, 09:24 Lisa T. Haber notifications@github.com wrote:
Hey folks, catching up here -- I'm still happy to fix up the Ecophysiology vignette and include standard figures Chris mentioned. Probably best to wait for a working example to emulate, though, so I'll stay tuned for that. The methods and data description in this vignette seem thorough to me but happy to expand/modify if anyone thinks we should.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FoRTExperiment/fortedata/issues/56#issuecomment-739948506, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7XVVMWX7WO5SYPOF7WYCTSTTQH7ANCNFSM4UHX3DVA .
For the life of me I cannot figure out why some vignettes build plots no issue and others simply do not.
Are the figures just not rendering or is there an error message associated with it?
I get nothing: https://fortexperiment.github.io/fortedata/articles/fd_remote_sensing_vignette.html
But using basically the same code: https://fortexperiment.github.io/fortedata/articles/fd_litter_vignette.html
The litter vignette does fine. I guess I can cut the images and see if that does anything?
however the more i look at this, it seems i may not being doing the correct build procedure for pkgdown in order to update
I am closing this issue. the two remaining issues here are long term.
Necessary changes/updates needed for ESSD review. I have @'ed you to get input or check and see if you can help with this task:
[x]
fd_leaf_spectrometry
only returns 8 entries. seems to be an issue with the line 25leaf_spec <-leaf_spec[grepl('([A-Za-z])', leaf_spec$index_value), ]
@atkinsjeff[x]
fd_inventory()
lacks adequete description @atkinsjeff[ ] file size appears to an issue. our largest files are far and away vignette files. can these be compressed further or left out for CRAN? @bpbond @stephpenn1 @atkinsjeff
[x]
fd_canopy_structure_summary()
lacks details and description @atkinsjeff[x]
forte_colors()
needs to be made into a palette function similar to wesanderson maybe? @bpbond @atkinsjeff @stephpenn1 @kdorheim[x]
plot_metadata()
needs to be functionalized. I (jeff) have some code I have been using for this on my end, I can post for review to @bpbond @stephpenn1 @kdorheim[x] Belowground vignette has been assigned initially to @kaylamathes
[x] Leaf Phys vignette has been assigned initially to @lisahaber
[x] Thorough vignette check needed for all vignettes to identify needs @bpbond @kaylamathes @kdorheim @atkinsjeff @ashiklom @cmgough et al.
[x] update data availability plot and code @atkinsjeff
[x] DBH inventory has mismatched codes and errors that need to be corrected Review comments: "fd_inventory: Explain/note data with missing date information, species codes marked ???? (unidentifiable species?) for DBH inventory. What is the column “tag” in the fd_inventory data set? It lacks a description in the SI. This appears to maybe be an index column but there are a few errors in the numbering. “tag” 2236-2244 have an erroneous 9 in front of them, it appears. Should be DP II instead of PD II caliper, presumably." @atkinsjeff
[x]
fd_soil_respiration()
Reviewer comments "2,791 observations are in the dataset when loaded through the R package. Again there are missing timestamp values (1,622 or over half the data, which even if the date is available is notable)." @kaylamathes @kdorheim @atkinsjeff @stephpenn1[x] fd_leaf_spectrometry: Reviewer comments: "Was this dataset exported as only the head of the data? Only 8 rows of data import using the R package when 7,155 are expected. There is an additional column “tree_id” that is not in the format of USDA PLANTS species codes and not defined under Table S6 in the SI." @lisahaber @atkinsjeff
[x] fd_litter: Is Reviewer comments: “MISC” code equivalent to “MIX” as defined in L184 or is it actually Mikania scandens? What are the codes “SWD” and “FAGRE”, these don’t appear to be USDA PLANTS codes? Is there a reason to not just use a column of the actual litter mass rather than the intermediate columns for bag mass and bag+litter mass?"
Note from Jeff: The last part of this I can address in text so that part can go to me (@atkinsjeff ) but if someone wants to add code to change FAGRE to FAGR, etc. that would be cool. The SWD thing should be CWD for coarse woody i think. @stephpenn1
[x] fd_hemi_camera: Again a mismatch between reported observations and the number @atkinsjeff
[x] fd_canopy_structure: Reviewer comment: "Again a mismatch between reported observations and the number in the R package dataset. In the associated Table S10 variables are separated by periods instead of underscores as in the actual data and the other SI tables. There are additional undescribed variables such as the skew and kurtosis intensity missing from Table S10" @atkinsjeff
[x] : fd_plot_metadata() reviewer comment: "is not how the function appears in the R package. The help file in the R package does not describe how to use it to get the metadata properly. " @bpbond @kdorheim @stephpenn1
[x] fd_metadata(table = “fd_inventory”) returns the whole metadata tibble. @bpbond @kdorheim @stephpenn1
COMMENTS FROM REVIEWER ONE
[x] Should we embed measurement units in all column names? Note from Jeff, I want consistency, but could be unwieldy. @bpbond
[ ] Need (?) machine readable meta data
If you see any more issues, please add. --jeff