antarctica / asli-pipeline

This repository contains a pipeline for operational execution of the Amundsen Sea Ice Low calculations, provided in the asli package. The functions in the asli package are described in detail in the package repository amundsen-sea-low-index.
MIT License
0 stars 0 forks source link

OPeNDAP/NetCDF access #3

Open thomaszwagerman opened 5 months ago

thomaszwagerman commented 5 months ago

This initially came up when experimenting with programmatic data access to PDC data. Data access (without download) is possible with most file formats (csv, tif, gpkg etc) but not with .nc.

library(httr2)
library(ncdf4)
library(sf)
library(terra)
library(stars)

# Reading csv directly----
csv_url <- "https://ramadda.data.bas.ac.uk/repository/entry/get/Goudier_beach_debris.csv?entryid=synth%3A32a8c71b-e70f-4fdf-9767-57b237b50660%3AL0dvdWRpZXJfYmVhY2hfZGVicmlzLmNzdg%3D%3D"

# Can be done through simple read.csv command
goudier_df <- read.csv(
  csv_url
)

# Reading a netcdf file directly does not work
nc_url <- "https://ramadda.data.bas.ac.uk/repository/entry/get/BAS_SONA_ShipResearch_RRSJamesClarkRoss_M_ScotiaSea_2006-10-31T04Z_2006-10-31T18Z.nc?entryid=synth%3Ac831d5e4-8d03-4aea-a6c6-6db101f36d8d%3AL0pSMTYxXzAwMi9CQVNfU09OQV9TaGlwUmVzZWFyY2hfUlJTSmFtZXNDbGFya1Jvc3NfTV9TY290aWFTZWFfMjAwNi0xMC0zMVQwNFpfMjAwNi0xMC0zMVQxOFoubmM%3D"

# However, unlike other file format reading in this url with ncdf4 directly does not work
sst_ncdf4 <- ncdf4::nc_open(
  nc_url
)

A bit of investigation showed this was also the case with most other data centres (EIDC, BODC, NGDC), bar CEDA which have OPeNDAP in place and its use documented OPeNDAP scripted interaction.

# The CEDA source is happy
ceda_cicero <- "https://dap.ceda.ac.uk/thredds/dodsC/badc/cru/data/cru_ts/cru_ts_3.24.01/data/tmp/cru_ts3.24.01.1901.2015.tmp.dat.nc"

sst_ceda <- nc_open(
  ceda_cicero
)

So, this is a NERC wide issue.

After conversation with PB, he said that Ramadda does have an OPeNDAP plugin so having OPeNDAP should be a possibility. Discussed in the past, and on the list of things to do but has not happened yet. Suggested that a use case could bump it up on the list of priorities. If there is any way I can help with its implementation under BOOST-EDS I think that would be great.

I think the use case would be in a cloud workflow where PDC data stored in an .nc files has to be read in, but don't want to burden the JASMIN storage capacity if it is not needed. It is possible to storage large files on JASMIN, no problem, but computationally not very efficient. Could turn a 10/15 minute process of data ingestion/manipulation into a 2 minute one, if a netCDF file could be queried directly. Something something net zero, making computing greener.

General Resources

OPeNDAP Quick Start Guide

How to set up an OPeNDAP server

The issue with netCDF data types

It's been done by others

Ramadda and OPeNDAP

I can't seem to find and OPeNDAP plugin as suggested, however I CAN find a thredds one Thredds plugin download and a ramadda + thredds installation tutorial