dankelley / oceglider

R package for processing ocean glider data
https://dankelley.github.io/oceglider/
3 stars 1 forks source link

slocum vignette should use downloadable data file #121

Closed dankelley closed 2 months ago

dankelley commented 2 months ago

It is using a file I got from somewhere, some years ago. And the code blocks are marked with eval=FALSE to prevent failure in remote tests.

I plan to look for a slocum file and have the vignette download that. I'll look in CPROOF since that seems like a good spot for finding data that are using today's standards for variable names etc. Also, the vignette uses a special function for slocum netcdf files, but I am hoping that the general code we have for netcdfs might work, or be made to work.

At the moment, I think the package is good for sea-explorer, in 3 types

On CPROOF I also see some files that are

dankelley commented 2 months ago

I looked at 3 files, and the smallest (by an order of magnitude) that I encountered so far is bumblebee, so I'll go with that, I think. Below is code ... I'm not saving to a tempfile because that's only at the very end, to save time. (I think 11 seconds on my home machine might be a minute or more on my work machine.)

library(oce)
url <- paste0(
    "https://cproof.uvic.ca/gliderdata/deployments/",
    "dfo-bumblebee998/dfo-bumblebee998-20221207/",
    "L0-timeseries/",
    "dfo-bumblebee998-20221207_delayed.nc"
)
file <- gsub(".*/", "", url)
# 45.8 MB file: download takes 0.4s user, 0.9s system but 11s elapsed
if (!file.exists(file)) {
    system.time(curl::curl_download(url = url, destfile = file, mode = "wb"))
}
dankelley commented 2 months ago

Holy foghorn!

Here's the timing to read the file. NetCDF is so great. Now, I have to see if it read stuff correctly, though... that will take a while and I may end up recoding. But note that I didn't test an existing (old) function that was specialized to read slocum netcdf. I don't want to have 2 different functions to maintain, one for sea-explorer nc and the other for slocum nc. Twice the functions is twice the line count and therefore at least twice the maintenance cost. (I think maintenance cost is nonlinear in line count.)

> system.time(g <- read.glider.netcdf(file))
   user  system elapsed 
  0.072   0.048   0.121 
dankelley commented 2 months ago

Good news. The summary gives as below, so it's catching names and units quite ell. (NOTE: the units for angles are stated as radians because that's what is in the files. But our friends on the other coast will be fixing that up at some point ... for now, anybody working with the data would catch on that it is in degrees after a quick check, so no big deal.)


Glider Summary
--------------

* Input file:
    dfo-bumblebee998-20221207_delayed.nc
* Type:    slocum
* Subtype: ?
* Time:    2022-12-07 16:15:54 to 2022-12-11 06:53:54  (mean increment 1.366 s)
* Data Overview:
                                       Min.       Mean      Max.   Dim.          OriginalName
    latitude [°N]                    48.962     49.087    49.159 228325              latitude
    longitude [°E]                  -127.35    -126.82    -125.9 228325             longitude
    heading [rad]                 0.0081383     3.5527    6.2779 228325               heading
    pitch [rad]                    -0.91705 -0.0039137   0.96655 228325                 pitch
    roll [rad]                     -0.41012 -0.0049735   0.23825 228325                  roll
    waypointLatitude [°N]                 0     48.033    48.852 228325     waypoint_latitude
    waypointLongitude [°E]           -130.2    -128.77         0 228325    waypoint_longitude
    conductivity [S/m]                2e-05     3.3763    3.5003 228325          conductivity
    temperature [°C]                 3.4766     7.1911    14.208 228325           temperature
    pressure [dbar]                 -0.2657     234.62    1039.6 228325              pressure
    chlorophyll [mg/m³]              0.0219    0.24789    12.096 228325           chlorophyll
    cdom [ppb]                      -3.8178    0.91799    97.899 228325                  cdom
    backscatter700               0.00010584 0.00033849 0.0077169 228325       backscatter_700
    oxygenConcentration [μmol/l]      8.644     148.74    293.94 228325  oxygen_concentration
    u [m/s]                        -0.51893   -0.12134   0.12245 228325                     u
    v [m/s]                       -0.075335   0.070058   0.31982 228325                     v
    depth [m]                      -0.26345     232.28    1028.2 228325                 depth
    distanceOverGround [km]               0     85.221    155.63 228325  distance_over_ground
    salinity [1e-3]              4.3609e-06     33.197    34.369 228325              salinity
    potentialDensity [kg/m³]         999.61       1026    1027.4 228325     potential_density
    density [kg/m³]                  999.67     1026.9      1032 228325               density
    potentialTemperature [°C]        3.4022     7.1654     10.95 228325 potential_temperature
    profileIndex                          0     147.51       218 228325         profile_index
    profileDirection                     -1   0.014065         1 228325     profile_direction

* Processing Log

    - 2024-09-05 12:16:35.438 UTC: `create 'glider' object`
dankelley commented 2 months ago

Those waypoints are kind of strange, with zero values. But, again, that's not an issue for this package.

dankelley commented 2 months ago

The plots are made with the code that follows them. I'm not sure why time is extending so far past the last data in some of the plots, and I'll look into that because it's likely general. However, things are looking good for using read.glider.netcdf(), so there may be no need to keep the old slocum-specific function for this filetype.

I'm putting all this in "Details" because I don't like scrolling in webpages. But, I hope @richardsc and @clayton33 will click to see the details. My plan is to rewrite the slocum vignette today. I won't say much about the signals -- somebody else might be publishing about those things -- and will likely just show the code and plots like below.

```R library(oceglider) url <- paste0( "https://cproof.uvic.ca/gliderdata/deployments/", "dfo-bumblebee998/dfo-bumblebee998-20221207/", "L0-timeseries/", "dfo-bumblebee998-20221207_delayed.nc" ) file <- gsub(".*/", "", url) # 45.8 MB file: download takes 0.4s user, 0.9s system but 11s elapsed if (!file.exists(file)) { system.time(curl::curl_download(url = url, destfile = file, mode = "wb")) } g <- read.glider.netcdf(file) # under 0.2s summary(g) png("oceglider-slocum-%02d.png") plot(g, which = "map") plot(g, which = "p") plot(g, which = "p", type = "p", colorby = "temperature") plot(g, which = "p", type = "p", colorby = "temperature", colorbylim = quantile(g[["temperature"]], c(0.01, 0.99), na.rm = TRUE) ) plot(g, which = "p", type = "p", colorby = "salinity") plot(g, which = "p", type = "p", colorby = "salinity", colorbylim = quantile(g[["salinity"]], c(0.01, 0.99), na.rm = TRUE) ) dev.off() ``` **01.png** ![oceglider-slocum-01](https://github.com/user-attachments/assets/2fcc134f-2033-423a-a1ec-ffc4a995fcb3) **02.png** ![oceglider-slocum-02](https://github.com/user-attachments/assets/3794f5a7-1115-4c6c-afd5-cf92f308a92b) **03.png** ![oceglider-slocum-03](https://github.com/user-attachments/assets/00db2798-727e-47d8-b26d-4d73aeca029c) **04.png** ![oceglider-slocum-04](https://github.com/user-attachments/assets/ffa0896d-1b6f-48cd-b481-3d9f5bb6fb65) **05.png** ![oceglider-slocum-05](https://github.com/user-attachments/assets/8946bf90-4fc9-42a2-b8d8-26b91102bafc) **06.png** ![oceglider-slocum-06](https://github.com/user-attachments/assets/362eeb53-9aea-4752-b09b-6f31ef35b9af)
dankelley commented 2 months ago

Done and pushed ("develop" branch).