dankelley / oce

R package for oceanographic processing
http://dankelley.github.io/oce/
GNU General Public License v3.0
142 stars 42 forks source link

How to add discrete nutrient data to CTD object in same pressure space? #1784

Closed janstrauss1 closed 3 years ago

janstrauss1 commented 3 years ago

Hi @dankelley and @richardsc,

Short summary of problem

I've successfully created a CTD object using as.ctd(). Now I would like to add nutrient data that has been measured at distinct water depths and I've tried to achieve this using oceSetData().

Yet, I don't get the data dimensions to match that of the CTD measurements and I'm wondering what the best way would be to achieve this without too much manual tweaking?

In other words, I would like to coerce my nutrient data into a CTD object and S, T, p space. Something similar must have been done here by @jcmcnch to plot discrete chemical and biological data alongside CTD profiles but I'm not clear how this was done.

Details (optional)

Here is a reproducible example that showing my situation:

### Load example ctd data set
data(ctd)

### Generate random nutrient data frame
set.seed(123)
Depth <- runif(8,1,40)
Ammonium <- runif(8,0.2,0.7)
Nitrite <- runif(8,0.04,0.6)
Phosphate <- runif(8,0.3,2.5)
Silicate <- runif(8,1.5,32)
Nitrate <- runif(8,0,32)
random.nutrients <- data.frame(Depth,Ammonium,Nitrite,Phosphate,Silicate,Nitrate)

### compute pressure from depth and add to nutrient table
random.nutrients$dbar <- swPressure(depth = random.nutrients$Depth)

### Set nutrient data slots to example CTD object
ctd <- oceSetData(ctd, name = "Ammonium", value = random.nutrients$Ammonium,
                  unit=list(unit=expression(mu*M), scale=""))
ctd <- oceSetData(ctd, name = "Nitrite", value = random.nutrients$Nitrite,
                  unit=list(unit=expression(mu*M), scale=""))
ctd <- oceSetData(ctd, name = "Phosphate", value = random.nutrients$Phosphate,
                  unit=list(unit=expression(mu*M), scale=""))
ctd <- oceSetData(ctd, name = "Silicate", value = random.nutrients$Silicate,
                  unit=list(unit=expression(mu*M), scale=""))
ctd <- oceSetData(ctd, name = "Nitrate", value = random.nutrients$Nitrate,
                  unit=list(unit=expression(mu*M), scale=""))
summary(ctd)

### plot some nutrient profile
plot(ctd, which = "Nitrate")

However, the nutrient profile doesn't look as I expected and is not spread over the full pressure space?!

What would be the best way to match the dimensions of the added data with that of the CTD data without too much manual tweaking? Eventually, I would like to plot sections for different parameters...

Many thanks in advance for your help!

jcmcnch commented 3 years ago

Hi Jan,

You can see the syntax I used to generate those plots here:

https://github.com/jcmcnch/GA03WaterColumnCharacterization/blob/main/scripts/08-make-plots.R

It may not be the simplest or most elegant way to do this, but it seemed to work!

Hope it helps, Jesse

janstrauss1 commented 3 years ago

Hi @jcmcnch,

many thanks for your feedback and link to your plotting script!

As far as I understand your script, you basically create multiple CTD objects for plotting - one for your real CTD data set and additional ones for your chemical and biological data. In the end, you plot profiles next to each other calling plot for your different CTD objects.

While this might work, I was rather looking for efficient ways to add additional data (e.g. chemical and biological data sets that have a different dimension) directly to a CTD object.

Then I discovered the concatenate function of oce and thought it might work to split my ctd object into two based on depths that have been sampled for nutrients measurements. My idea was to then add my nutrient data and eventually recombine the split ctd objects again.

Yet, this doesn't seem to work as you can see in the example below:

### Load example ctd data set
data(ctd)

### Generate random nutrient data frame that relates to the ctd data set
set.seed(100)
## take a random sample of size 3 from a CTD depth to represent samples for which nutrients were measured
## sample without replacement
DepthWithNutrientMeas <- sample(ctd[["depth"]], 3, replace=FALSE)
## make up some random Nitrate nutrient data
Nitrate <- rnorm(3, mean=16, sd=1)
## build nutrient table
random.nutrient.table <- data.frame(DepthWithNutrientMeas, Nitrate)

## Split ctd object
## Not ideal. How can ctd object be subset by specific depths (e.g. by DepthWithNutrientMeas vector)?
ctd1 <- subset(ctd, depth == 25.340 |
         depth == 27.846 |
         depth == 37.017)
ctd2 <- subset(ctd, depth != 25.340 &
                 depth != 27.846 &
                 depth != 37.017)

## add Nitrate nutrient data to ctd1 
ctd1 <- oceSetData(
  ctd1, name = "Nitrate", value = random.nutrient.table$Nitrate,
  unit=list(unit=expression(mu*M), scale=""),
  note = "random Nitrate conc.")

## inspect ctd1
summary(ctd1)

## show nutrient table and plot nutrient profile for comparison
random.nutrient.table
plot(ctd1, which = "Nitrate")

## recombine modified ctd1 and ctd2
CTD <- concatenate(ctd1, ctd2)

But concate(ctd1, ctd2) doesn't seem to work throwing the following error:

Error in concatenate(ctd1, ctd2) : 
  data name mismatch between argument 1 (depth flag Nitrate pressure salinity scan temperature timeS) and argument 1(depth flag pressure salinity scan temperature timeS)

I'm still not sure how to best tackle the problem and would be very interested to hear also what @dankelley and @richardsc think about this? Also, how to best subset a ctd object based on specific depths (as commented in my code above)? Anything would help.

Many thanks in advance!

dankelley commented 3 years ago

Sorry to come in this so late in the game. I've never wanted to do this sort of thing. I prefer to keep separate data to their own objects. But, if you need to, there's a way.

As has been pointed out, ctd objects have just one set of pressures. So, you'll need to cast your data onto those pressures before you can insert them in the object.

How you cast them is up to you. The simplest way, piecewise-linear interpolation, is illustrated below. I do not recommend smoothing, even though that might make a prettier picture, because I think the goal of a plot is to show data.

Note that the plot produced by the below is not showing the unit for this newly-created column. That's a bug I ought to fix, actually.

I may be missing the whole point here, though.

library(oce)
#> Loading required package: gsw
#> Loading required package: testthat
#> Loading required package: sf
#> Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1
data(ctd)
newMeasurement <- c(1,3,5,1,0)
newPressure <- c(10,20,21,30,50)
newData <- approx(newPressure, newMeasurement, ctd[["pressure"]])$y
ctdNew <- oceSetData(ctd, "new", newData, unit=list(unit=expression(mol/kg), scale=""))
summary(ctdNew)
#> CTD Summary
#> -----------
#> 
#> * Instrument:          SBE 25 
#> * Temp. serial no.:    1140
#> * Cond. serial no.:    832
#> * File:                "/Users/kelley/git/oce/create_data/ctd/ctd.cnv"
#> * Original file:       c:\seasoft3\basin\bed0302.hex
#> * Start time:          2003-10-15 15:38:38
#> * Sample interval:     1 s
#> * Cruise:              Halifax Harbour
#> * Vessel:              Divcom3
#> * Station:             Stn 2
#> * Location:            44.684N 63.644W
#> * Data Overview
#> 
#>                                  Min.   Mean   Max. Dim. NAs OriginalName
#>     scan                          130    220    310  181   0         scan
#>     timeS [s]                     129    219    309  181   0        timeS
#>     pressure [dbar]              1.48 22.885 44.141  181   0           pr
#>     depth [m]                   1.468 22.698 43.778  181   0         depS
#>     temperature [°C, IPTS-68]   2.919 7.5063 14.237  181   0         t068
#>     salinity [PSS-78]          29.916 31.219 31.498  181   0        sal00
#>     flag                            0      0      0  181   0         flag
#>     new [mol/kg]              0.29295 1.7277 4.9484  181  36            -
#> 
#> * Processing Log
#> 
#>     - 2018-11-14 20:03:47 UTC: `create 'ctd' object`
#>     - 2018-11-14 20:03:47 UTC: `read.ctd.sbe(file = file, debug = 10, processingLog = processingLog)`
#>     - 2018-11-14 20:03:47 UTC: `oce.edit(x = ctd, item = "startTime", value = as.POSIXct(gsub("1903",     "2003", format(ctd[["startTime"]])), tz = "UTC") + 4 * 3600,     reason = "file had year=1903, instead of 2003", person = "Dan Kelley")`
#>     - 2021-02-16 20:43:28 UTC: `oceSetData(object = ctd, name = "new", value = newData, unit = list(unit = expression(mol/kg),     scale = ""))`
plotProfile(ctdNew, "new")

Created on 2021-02-16 by the reprex package (v1.0.0)

richardsc commented 3 years ago

Hi @janstrauss1 -- also sorry for coming late to the game (though happy that @jcmcnch was the first to give an answer -- it must mean that oce is really starting to "make it"!).

Dan's method is essentially what I would have done to accomplish such a "merging" of continuous CTD with bottle data, though like him this isn't normally the approach that I would have taken and would more likely have just kept them in separate objects (but obviously that decision is based on what kind of analysis you plot to do). In the past I have done something like the opposite -- that is, I approx() the CTD data to the bottle locations, so that I can easily do a comparison between bottles and something that was measured on the rosette (e.g. salts, O2, etc).

As for your other question, about how best to subset based on depth, you have to be careful when you use == and != on floating point numbers, because even if you know two things should be the same, they won't necessarily evaluate that way due to the binary representation of numbers on a computer. E.g.

> (0.1 + 0.2) == 0.3
[1] FALSE

A good summary is given here.

So the approach with subset would more likely be to subset based on a range, e.g.

ctd_shallow <- subset(ctd, depth < 100)
ctd_middle <- subset(ctd, depth >= 100 & depth < 1000) 
ctd_deep <- subset(ctd, depth >= 1000)

Also, I had forgotten (or didn't know about in the first place) the concatenate() function. I'm not sure if I'm happy to know about it again, or kind of horrified that it is probably not something that I would recommend anyone use 😄 . It has to have the sparesest documentation of all functions in oce:

Concatenate oce objects

Description:

     Concatenate oce objects

Usage:

     concatenate(object, ...)

Arguments:

  object: an oce object.

     ...: optional additional oce objects.

Value:

     An object of class corresponding to that of ‘object’.
dankelley commented 3 years ago

I also didn't recall that concatenate existed. I see that I created it as a "generic" function, and only specialized it for one type: the adp object. You can learn more about it by typing ?'concatenate,adp-method'.

I think the word adp will make Clark see why I made this. It's because we were working with some ADP data that were chopped up into chunks by a logging computer.

Using concatenate on CTD data is highly discouraged.

janstrauss1 commented 3 years ago

@richardsc and @dankelley,

many thanks for your feedback and food for thought! That's super helpful.

Yet, I'm still in two minds about whether in my case it would be better to merge my continuous CTD and discrete CTD Niskin bottles data or keep them separate in different objects. I guess I was initially also thinking to generate some kind of single object for data sharing purposes until @richardsc specifically pointed out here that as.ctd() isn't really meant for this. I'm still unsure, however, how to best share such separate data sets. Yet, this is not an oce issue but rather a general issue of data sharing in oceanography...

Coming back to my specific issue, your suggestion to use approx to interpolate between distinct data points works nicely. Yet, I was initially just looking for an even simpler way to cast, assign or map (you name it) my data onto the depths (or pressures) without even interpolating and then merge them with my ctd object. Note that I only have depth information for my nutrient measurements.

I now managed to achieve this with the code below. Yet, I came across some odd behaviour of oce.

As always any comments and feedback are highly appreciated!

## install package from github
# remotes::install_github("dankelley/oce", ref="develop")

library(oce)
### Load example ctd data set
data(ctd)

### Generate random nutrient data frame that relates to the ctd data set
## make up some nutrient data
nutrientMeasurement <- c(0.01,0.3,0.7,7,15,22,24,30)

set.seed(100) # for reproducibility
## take a random sample of size 8 from CTD depth to represent samples for which nutrients were measured
## sample without replacement
DepthWithNutrientMeas <- sample(ctd[["depth"]], 8, replace=FALSE)
### compute pressure from depth
PressureWithNutrientsMeas <- swPressure(depth = DepthWithNutrientMeas)

## make example nutrient table
## Note that depth and pressure are sorted as initial depth was randomly sampled from ctd depth
nutrient.table <- data.frame(sort(DepthWithNutrientMeas),sort(PressureWithNutrientsMeas), nutrientMeasurement)

newData <- merge(cbind(ctd[["depth"]]),
                 cbind(nutrient.table$sort.DepthWithNutrientMeas., nutrient.table$nutrientMeasurement),
                 by = 1, all.x = TRUE)$V2

ctdNew <- oceSetData(ctd, "nutrient", newData, unit=list(unit=expression(mu*M), scale=""))
summary(ctdNew)

# plotProfile(ctdNew, "nutrient") # does not work
plotProfile(ctdNew, xtype = newData) # also does not work?

grafik

Yet, plot(ctdNew, which = "nutrient") seems to work?!

grafik

janstrauss1 commented 3 years ago

Using concatenate on CTD data is highly discouraged.

I will certainly forget about concatenate then. Actually, I felt a bit dodgy already when using it.

dankelley commented 3 years ago

@janstrauss1, below is a reprex that I think demonstrates what might be your most recent problem: plotProfile() by default uses lines to connect data, and your new dataset has isolated data separated by NA sequences. R, in general, does not join up points that are separated by NA sequences. You need to set the type to "p" to see points.

Note that I am using a reprex. That means you can copy/paste the code into R and you should get the same result. Please consider using them in future, because they are so much easier to work with than code intermixed with text.

As to the general matter of merging together different data streams, I think it's just a bad idea. It is very, very, common to work with different data streams, and the normal way to do it is simply to use different data files at the system level and different R objects at the R level. If you want to bind related things together, you can make a list, something like

li <- vector("list", 2) # for 2 objects
li[[1]] <- CTDobject
li[[2]] <- relatedADPobject

or something. Still, I don't see any particular merit in doing that, really.

The case of using bottle data to calibrate a CTD has been discussed by @richardsc already, but that does not benefit from trying to put two things into a single object. Really, I struggle to think of a reason to try to merge disparate things into a single object. If the idea is to produce a product that can be shared to others, then I think the best plan is to have separate data files, and code that deals with them. That lets others walk down the path you intend, but walk down others, if they choose to do so.

I think this is straying quite a lot from the original title. In the oce project, we try to restrict discussion to fall under the umbrella of the original title. So, you might want to consider closing this issue. I would have opened a new issue on the matter of NA values and type="l", but that has been discussed before.

reprex output ``` r library(oce) #> Loading required package: gsw #> Loading required package: testthat #> Loading required package: sf #> Linking to GEOS 3.8.1, GDAL 3.1.4, PROJ 6.3.1 ### Load example ctd data set data(ctd) ### Generate random nutrient data frame that relates to the ctd data set ## make up some nutrient data nutrientMeasurement <- c(0.01,0.3,0.7,7,15,22,24,30) set.seed(100) # for reproducibility ## take a random sample of size 8 from CTD depth to represent samples for which nutrients were measured ## sample without replacement DepthWithNutrientMeas <- sample(ctd[["depth"]], 8, replace=FALSE) ### compute pressure from depth PressureWithNutrientsMeas <- swPressure(depth = DepthWithNutrientMeas) ## make example nutrient table ## Note that depth and pressure are sorted as initial depth was randomly sampled from ctd depth nutrient.table <- data.frame(sort(DepthWithNutrientMeas),sort(PressureWithNutrientsMeas), nutrientMeasurement) newData <- merge(cbind(ctd[["depth"]]), cbind(nutrient.table$sort.DepthWithNutrientMeas., nutrient.table$nutrientMeasurement), by = 1, all.x = TRUE)$V2 ctdNew <- oceSetData(ctd, "nutrient", newData, unit=list(unit=expression(mu*M), scale="")) summary(ctdNew) #> CTD Summary #> ----------- #> #> * Instrument: SBE 25 #> * Temp. serial no.: 1140 #> * Cond. serial no.: 832 #> * File: "/Users/kelley/git/oce/create_data/ctd/ctd.cnv" #> * Original file: c:\seasoft3\basin\bed0302.hex #> * Start time: 2003-10-15 15:38:38 #> * Sample interval: 1 s #> * Cruise: Halifax Harbour #> * Vessel: Divcom3 #> * Station: Stn 2 #> * Location: 44.684N 63.644W #> * Data Overview #> #> Min. Mean Max. Dim. NAs OriginalName #> scan 130 220 310 181 0 scan #> timeS [s] 129 219 309 181 0 timeS #> pressure [dbar] 1.48 22.885 44.141 181 0 pr #> depth [m] 1.468 22.698 43.778 181 0 depS #> temperature [°C, IPTS-68] 2.919 7.5063 14.237 181 0 t068 #> salinity [PSS-78] 29.916 31.219 31.498 181 0 sal00 #> flag 0 0 0 181 0 flag #> nutrient [\u03bcM] 0.01 12.376 30 181 173 - #> #> * Processing Log #> #> - 2018-11-14 20:03:47 UTC: `create 'ctd' object` #> - 2018-11-14 20:03:47 UTC: `read.ctd.sbe(file = file, debug = 10, processingLog = processingLog)` #> - 2018-11-14 20:03:47 UTC: `oce.edit(x = ctd, item = "startTime", value = as.POSIXct(gsub("1903", "2003", format(ctd[["startTime"]])), tz = "UTC") + 4 * 3600, reason = "file had year=1903, instead of 2003", person = "Dan Kelley")` #> - 2021-02-18 11:10:03 UTC: `oceSetData(object = ctd, name = "nutrient", value = newData, unit = list(unit = expression(mu * M), scale = ""))` par(mfrow=c(2,2)) plotProfile(ctdNew, "nutrient") plotProfile(ctdNew, xtype = "nutrient") plotProfile(ctdNew, "nutrient", type="p") plotProfile(ctdNew, xtype = "nutrient", type="p") ``` ![](https://i.imgur.com/PXoZTwK.png) Created on 2021-02-18 by the [reprex package](https://reprex.tidyverse.org) (v1.0.0)
janstrauss1 commented 3 years ago

@dankelley, many thanks for your feedback and sorry for straying from the original issue!

Happy to close this issue.

dankelley commented 3 years ago

No worries on the straying. I do that a lot, myself! We try to stick to titles in oce, because that will make it easier for other users to find topics.

Please feel free to raise new issues, as needed. It really helps. For example, I had never noticed before that no units were appearing on profile plots, for newly-added data columns.

richardsc commented 3 years ago

Just wanted to echo Dan to say that opening issues, for bugs, feature requests, or even just advice, is helpful for everyone (including future us!). I'm always happy to discuss, and really pleased that you're finding oce useful!

And on your point about data sharing ... we know. We have discussed this quite a bit in the past, and have ended up shying away from providing any "real" export format, for fear of simply muddying the field with YADF (yet another data format) and decided to let users decide what/how to export for their own specific cases. I often think of this XKCD comic in such situations:

image