Thomas-Moore-Creative commented 1 year ago

Work done by @ChrisC28 & @BecCowley has identified gaps in even the most common baseline ocean observations ( temperature & salinity ) over all time for large areas of the Australian EEZ / region in state-of-the-art global databases like the WOD.

For example this preliminary plot from @ChrisC28 shows the lack of any observations in the WOD for much of the Australian shelf & coast over all time. And regions with coverage may still suffer from seasonal aliasing?

Screen Shot 2023-02-15 at 11 34 03 am

But are we missing local Australian data holdings not in WOD?

Important considerations for using the AODN could include:

How can we search across all observations and all platforms and over all time in these specific regions?
Does AODN data have any flags or metadata on what is / is not in the WOD or other global databases?
If we can identify observations not seen in the WOD can we make that selection across all platforms and grab a single data object with rich metadata on it's provenance?
Double checking for any duplication in WOD data already available.

mhidas commented 1 year ago

Thanks @Thomas-Moore-Creative - good quesions, and a great use case for us to know about!

mhidas commented 1 year ago

As I mentioned in our meeting today, in the short term you might find it helpful to query our catalogue of all moorings data files (from both the National Moorings Network and the Deep Water Moorings facilities of IMOS). Here's some info I copied from a Wiki. Unfortunately it's on a private Wiki, and it's a bit outdated, but mostly still relevant. I'll try to update the essential info...

Introduction

Moorings facility operators often want to know details of the data files they have provided. A common question is "Which files have I already uploaded?". There is an easy way to find answers to these questions directly, by accessing AODN web services (more specifically a Web Feature Service).

What is a web service?

A web service is a system that accepts requests and returns results over the Web (via HTTP). A request can be typed directly into the address bar of a browser, or given as an argument to a command-line tool (like curl). Often requests are generated by other software that interacts with the service.

The AODN Portal

As an example, web services make it possible to find, preview and download data using the AODN Portal. Behind the scenes, the portal combines three services that it talks to via the web:

A GeoNetwork metadata catalogue, to find data collections;
A Web Map Service (WMS) to generate map tiles (served by GeoServer)
A Web Feature Service (WFS) to provide data downloads (served by GeoServer).

These services can also be accessed directly, at

https://catalogue-portal.aodn.org.au for the metadata; and
http://geoserver-123.aodn.org.au/ for WMS/WFS

Web Feature Service (WFS)

A standard of the Open Geospatial Consortium (OGC)
Allows geographic features (spatial extent + data) to be accessed via the Web.
Allows filtering based on spatial extent and attributes.
Served by GIS software (e.g. GeoServer, QGIS, etc...) based on data in a database or files.
GIS software can also import data from WFS.

In this context a "feature" is a spatial entity (e.g. a point or line) with a set of attributes attached to it (the data). Think of it as a row in a table, where each column is one of the attributes, and one of the columns is the "geometry" specifying the spatial extent of the feature (in the horizontal plane only).

Information about published moorings files

We have set up a WFS called imos:moorings_all_map which allows you to obtain metadata about all currently public data files from the IMOS moorings facilities (National Mooring Network, and Deep Water Moorings). Each feature/row refers to a single file and provides the following details:

_fileid: This is just an id in the database, not very useful to you.
url: This is the full path of the file within the AODN storage hierarchy. Appending this path to 'https://data.aodn.org.au/' generates a downloadable URL (paste it in a browser's address bar or use command-line tools like wget or curl).
_datecreated: Date of file creation (from the global attribute)
_datepublished: Date the file was first added to our database.
_dateupdated: Date its details were last updated (e.g. if a new file of the same name is uploaded - usually only happens for real-time files.)
size: File size in bytes.
_featuretype (global attribute for Discrete Sampling Geometries e.g. "timeSeries", "profile")
_fileversion: Just the last digit from the "FV0x" label (see IMOS File Nameing Convention).
_toolboxversion: Version of the IMOS Matlab Toolbox used to process the file (global attribute)
_toolbox_inputfile: Name of input file used in the IMOS Matlab Toolbox to generate the file (global attribute)
_compliance_checkspassed: Checks applied by the pipeline before publishing. Usually "cf" and "imos:1.4". (global attribute) NO LONGER USED
_compliance_checkerversion: Code version of the IOOS Compliance Checker used. (global attribute) NO LONGER USED
_compliance_checker_imosversion: Code version of the IMOS checker plugin used. (global attribute) NO LONGER USED
realtime: (true/false)
_datamode: "real-time" or "delayed"
_sitecode (global attribute)
_platformcode (global attribute)
_deploymentcode (global attribute)
_datacategory: A general category for the type of data in the file, e.g. "Temperature", "CTD_timeseries", "Velocity", etc... (currently not cleary defined for some types of file; may be updated or removed in the future)
instrument (global attribute)
_instrument_serialnumber (global attribute)
_instrument_nominaldepth (global attribute)
_time_deploymentstart (global attribute)
_time_deploymentend (global attribute)
_time_coveragestart (global attribute)
_time_coverageend (global attribute)
latitude (from LATITUDE variable)
longitude (from LONGITUDE variable)
variables: Comma-separated list of variable names in the file
_standardnames: Comma-separated list of standard_name variable attributes names in the file (where applicable)
_longnames: Comma-separated list of long_name variable attributes names in the file
geom: The "geometry" of the data in the file, which is a simple point for all mooring timeseries and profiles.

These are boolean properties to allow easier filtering on the presence of certain types of parameter in the file:

_has_watertemperature
_has_airtemperature
_hassalinity
_has_waterpressure
_has_airpressure
_has_sea_watervelocity
_hasoxygen
_haschlorophyll
_hasfluorescence
_has_waveparameters

An additional column (FID, always first) is added by the server and can simply be ignored. Refer to the IMOS NetCDF Conventions for the meaning of the global attributes harvested.

How to query the moorings_all_map WFS?

You can download the entire table in comma-separated-values format (CSV - can be opened in e.g. Excel) by pasting the following request into your browser's address bar. (I have broken it up so it's a bit easier to see what the request is made up of, but you have to put it all on one line, with no spaces):

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv

To save you copy/pasting, here is a direct link to the same request. However, this will tell you everything about all ~40,000 files (download size about 22Mb), which is probably a lot more than you're interested in. Instead, you can apply filters to the table.

For example, to get the list of files for the Palm Passage mooring (GBRPPS) uploaded since the start of the year, add a cql_filter like this (only the last line is new):

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv&
        cql_filter=date_published AFTER 2018-01-01T00:00:00 AND site_code='GBRPPS'

Again, combine the request into one line, with no spaces between the arguments. Since the filter itself needs to contain spaces, they can be replaced with the code '%20'. Or just click (or copy & edit) this link. Now you'll only get the lines you're interested in.

You can also select which columns (properties) you're interested in by adding a propertyName argument, and the downloaded file will only include those columns. E.g. if you only want the file path, deployment code and instrument details for all delayed-mode files:

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv&
        cql_filter=realtime = FALSE&
        propertyName=url,deployment_code,instrument,instrument_serial_number,instrument_nominal_depth

Direct link

How to make it easier

Of course after a while it would become quite tedious typing these long requests into your browser, so better two get a program to do it. Here are a couple of examples of WFS access from a Python script:

More info

See the GeoServer WFS documentation for more details and advanced features.

mhidas commented 1 year ago

Here's a much more recent example where I'm using the same WFS (actually a subset of it) to query some metadata related to mooring configurations: https://github.com/aodn/aodn-public-notebooks/blob/d9ee9785221a7d75dbf58371a02db6aaa6ff2687/moorings/common.py#L16

croachutas commented 1 year ago

And similar issues apply elsewhere in the world no doubt (I know for a fact that NZ has a lot of data that hadn't made it into their public repositories, colleagues at MetOcean Solutions chased a lot of it up, but unsure if they're able to make it available to us... MetOcena also has T profiles from the Mangōpare sensors developed & deployed during the Moana project which they can't share publicly due to various arrangements with the fisheries industry).

In that light it might be worthwhile thinking about CARS v2.0 not just as a product in the form of a 'static' atlas but as a code base to allow other groups to create regional versions of the atlas with data that might not be generally available.

Thomas-Moore-Creative commented 1 year ago

@mhidas - ignorant question. Given https://data.aodn.org.au/imos-data is an S3 bucket should we be able to access it via s3fs? For example with NOAA S3 buckets I'm used to being able to do something like the following:

import s3fs
# Initialize s3 client with s3fs
fs = s3fs.S3FileSystem(anon=True)
# list contents of bucket
fs.ls('https://data.aodn.org.au/imos-data')

I'm no expert in the s3fs package or using S3 but this fails for me?

mhidas commented 1 year ago

@Thomas-Moore-Creative You're correct, it's a public S3 bucket, and its name is just imos-data (https://data.aodn.org.au/imos-data is a web front-end to it). s3fs only needs the bucket name (and optional prefix). So try this:

fs.ls('imos-data')
fs.ls('imos-data/IMOS/ANMN')
# etc...

Thomas-Moore-Creative commented 1 year ago

@Thomas-Moore-Creative ... So try this:

Thanks for helping me with those basics, @mhidas!

mhidas commented 1 year ago

:+1: No worries.

Something worth noting is that there are often multiple data products in there based on the same original observations. I'm not so familiar with the other IMOS facilities, but for the moorings there are generally at least 4-5 levels of product:

Original "raw" data - these are the closest to raw data we publish in NetCDF format. Some pre-processing may be done in instrument-specific software before conversion to NetCDF, but otherwise nothing has been altered or removed, and no quality control has been applied. These files have the label "FV00" in the file name (and global attribute file_version set to 0).
"Quality-controlled" data - these include the same data as above, plus additional computed variables (e.g. depth, salinity) and quality-control flags based on somewhat standard automated QC tests. These files have the label "FV01" in the file name (and global attribute file_version set to 1). Each file holds just one deployment's worth of data from one instrument, so there are usually hundreds of them per mooring site.
"Aggregated timeseries" - These simply aggregate the above FV01 files into a single file per site and measured parameter.
"Hourly timeseries" - These aggregate all data at a site into 2 files (one for all current velocity data, one for everything else), picking out only the "good" data and re-binning it to a common hourly timestamp.
"Gridded timeseries" - These take the hourly product and interpolate vertically to a set of pre-defined depths. We only do this where the water column is sampled at multiple depths, which means it only includes temperature at this stage.

You can learn more about the last three products here.

Thomas-Moore-Creative commented 5 months ago

2024 AODN hackathon

Where we'd like to be - are we already there?

a data pipeline for ocean observations that works with python tools on any laptop, supercomputer, or cloud instance.
a searchable catalog of all data that can be filtered by variable, platform/instrument type, time, space, WOD inclusion, other metadata.
returning a lazy object with rich metadata - provenance, QC, in/out WOD
returning a lazy object in xarray? pandas dask dataframe? parquet?

my_data = get_aodn(variable='temperature',time=(2005-01,2018-12),latitude=(-40,-20),longitude=(140,180),WOD='False')

Goals:

What backend service is future-proof for AODN services? (s3?)
What is a MVP we can build now to show the promise of this capability? XBT? CTD? Combination?
Identify any gaps and barriers at the AODN level that could be addressed with time to enable this for all ocean observations in the archive?

CARSv2 / cars-v2

Can the AODN help us fill in key data gaps in the Australian EEZ / region? #1

But are we missing local Australian data holdings not in WOD?

Introduction

What is a web service?

The AODN Portal

Web Feature Service (WFS)

Information about published moorings files

How to query the moorings_all_map WFS?

How to make it easier

More info

2024 AODN hackathon

Where we'd like to be - are we already there?

Goals: