claraqin / neonMicrobe

Processing NEON soil microbe marker gene sequence data into ASV tables.
GNU Lesser General Public License v3.0
9 stars 4 forks source link

Ensure that code involving NEON soil data products still works #39

Closed claraqin closed 3 years ago

claraqin commented 3 years ago

The downloadRawSoilData function and subsequent code were developed assuming an outdated NEON soil data structure, in which DP1.10086.001 was physical properties and DP1.10078.001 was chemical properties. Now, DP1.10086.001 contains the combined chemical and physical properties. We should check the add_environmental_variables vignettes and the downloadRawSoilData vignettes to ensure that they are still working properly.

claraqin commented 3 years ago

Update: I modified downloadRawSoilData so that it accounts for the new NEON soil data structure and prints more helpful messages. From my end, it seems like the vignette isn't handling the new soil data structure correctly, though, so that still needs to be updated.

claraqin commented 3 years ago

Copied from an email with suggestions from @lstanish:

  1. In the Documentation for the function, update text to read: "- DP1.10086.001: "Soil physical and chemical properties, periodic", tables sls_soilCoreCollection, sls_soilMoisture, sls_soilpH, and sls_soilChemistry." - That way users will know before running the function that it doesn't download the entire suite of soil metadata.
  2. The function aborted at the end because I forgot to specify a directory. Can you add a dir.exists() check at the beginning of the script to prevent this behavior?
  3. Starting in 2020, NEON began recording sampling events that did not actually occur: this is recorded in the samplingImpractical field in sls_soilCoreCollection. Records that have a samplingImpractical value !=OK are not associated with samples/data. It's worth removing these records before outputting the data.
  4. The script also outputs records where the boutType='fieldOnly', which are bouts to collect N-transformation incubation tubes. These are only useful for calculating N-transformation rates and don't have microbial samples. Suggest removing before outputting the data.
claraqin commented 3 years ago

The issues listed in the previous comment have been addressed as of the most recent commit.

In addition, %N and %organicC records, which were previously stored in separate rows, have been collapsed in to the same rows. However, C/N ratio has not yet been calculated from them, and we need to check if that is necessary.

The vignettes still need to be updated to use the new downloadRawSoilData function.

claraqin commented 3 years ago

Addressed with updates to downloadSoilData (previously downloadRawSoilData) from the merge of batch_structure into master.