CARSv2 / cars-v2

CARSv2 project repository - public
MIT License
4 stars 1 forks source link

Can the AODN help us fill in key data gaps in the Australian EEZ / region? #1

Open Thomas-Moore-Creative opened 1 year ago

Thomas-Moore-Creative commented 1 year ago

Work done by @ChrisC28 & @BecCowley has identified gaps in even the most common baseline ocean observations ( temperature & salinity ) over all time for large areas of the Australian EEZ / region in state-of-the-art global databases like the WOD.

For example this preliminary plot from @ChrisC28 shows the lack of any observations in the WOD for much of the Australian shelf & coast over all time. And regions with coverage may still suffer from seasonal aliasing?

Screen Shot 2023-02-15 at 11 34 03 am

But are we missing local Australian data holdings not in WOD?

Important considerations for using the AODN could include:

  1. How can we search across all observations and all platforms and over all time in these specific regions?
  2. Does AODN data have any flags or metadata on what is / is not in the WOD or other global databases?
  3. If we can identify observations not seen in the WOD can we make that selection across all platforms and grab a single data object with rich metadata on it's provenance?
  4. Double checking for any duplication in WOD data already available.
mhidas commented 1 year ago

Thanks @Thomas-Moore-Creative - good quesions, and a great use case for us to know about!

mhidas commented 1 year ago

As I mentioned in our meeting today, in the short term you might find it helpful to query our catalogue of all moorings data files (from both the National Moorings Network and the Deep Water Moorings facilities of IMOS). Here's some info I copied from a Wiki. Unfortunately it's on a private Wiki, and it's a bit outdated, but mostly still relevant. I'll try to update the essential info...

Introduction

Moorings facility operators often want to know details of the data files they have provided. A common question is "Which files have I already uploaded?". There is an easy way to find answers to these questions directly, by accessing AODN web services (more specifically a Web Feature Service).

What is a web service?

A web service is a system that accepts requests and returns results over the Web (via HTTP). A request can be typed directly into the address bar of a browser, or given as an argument to a command-line tool (like curl). Often requests are generated by other software that interacts with the service.

The AODN Portal

As an example, web services make it possible to find, preview and download data using the AODN Portal. Behind the scenes, the portal combines three services that it talks to via the web:

These services can also be accessed directly, at

Web Feature Service (WFS)

In this context a "feature" is a spatial entity (e.g. a point or line) with a set of attributes attached to it (the data). Think of it as a row in a table, where each column is one of the attributes, and one of the columns is the "geometry" specifying the spatial extent of the feature (in the horizontal plane only).

Information about published moorings files

We have set up a WFS called imos:moorings_all_map which allows you to obtain metadata about all currently public data files from the IMOS moorings facilities (National Mooring Network, and Deep Water Moorings). Each feature/row refers to a single file and provides the following details:

These are boolean properties to allow easier filtering on the presence of certain types of parameter in the file:

An additional column (FID, always first) is added by the server and can simply be ignored. Refer to the IMOS NetCDF Conventions for the meaning of the global attributes harvested.

How to query the moorings_all_map WFS?

You can download the entire table in comma-separated-values format (CSV - can be opened in e.g. Excel) by pasting the following request into your browser's address bar. (I have broken it up so it's a bit easier to see what the request is made up of, but you have to put it all on one line, with no spaces):

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv

To save you copy/pasting, here is a direct link to the same request. However, this will tell you everything about all ~40,000 files (download size about 22Mb), which is probably a lot more than you're interested in. Instead, you can apply filters to the table.

For example, to get the list of files for the Palm Passage mooring (GBRPPS) uploaded since the start of the year, add a cql_filter like this (only the last line is new):

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv&
        cql_filter=date_published AFTER 2018-01-01T00:00:00 AND site_code='GBRPPS'

Again, combine the request into one line, with no spaces between the arguments. Since the filter itself needs to contain spaces, they can be replaced with the code '%20'. Or just click (or copy & edit) this link. Now you'll only get the lines you're interested in.

You can also select which columns (properties) you're interested in by adding a propertyName argument, and the downloaded file will only include those columns. E.g. if you only want the file path, deployment code and instrument details for all delayed-mode files:

    http://geoserver-123.aodn.org.au/geoserver/imos/ows?
        service=WFS&
        version=1.0.0&
        request=GetFeature&
        typeName=imos:moorings_all_map&
        outputFormat=csv&
        cql_filter=realtime = FALSE&
        propertyName=url,deployment_code,instrument,instrument_serial_number,instrument_nominal_depth

Direct link

How to make it easier

Of course after a while it would become quite tedious typing these long requests into your browser, so better two get a program to do it. Here are a couple of examples of WFS access from a Python script:

More info

See the GeoServer WFS documentation for more details and advanced features.

mhidas commented 1 year ago

Here's a much more recent example where I'm using the same WFS (actually a subset of it) to query some metadata related to mooring configurations: https://github.com/aodn/aodn-public-notebooks/blob/d9ee9785221a7d75dbf58371a02db6aaa6ff2687/moorings/common.py#L16

croachutas commented 1 year ago

And similar issues apply elsewhere in the world no doubt (I know for a fact that NZ has a lot of data that hadn't made it into their public repositories, colleagues at MetOcean Solutions chased a lot of it up, but unsure if they're able to make it available to us... MetOcena also has T profiles from the Mangōpare sensors developed & deployed during the Moana project which they can't share publicly due to various arrangements with the fisheries industry).

In that light it might be worthwhile thinking about CARS v2.0 not just as a product in the form of a 'static' atlas but as a code base to allow other groups to create regional versions of the atlas with data that might not be generally available.

Thomas-Moore-Creative commented 1 year ago

@mhidas - ignorant question. Given https://data.aodn.org.au/imos-data is an S3 bucket should we be able to access it via s3fs? For example with NOAA S3 buckets I'm used to being able to do something like the following:

import s3fs
# Initialize s3 client with s3fs
fs = s3fs.S3FileSystem(anon=True)
# list contents of bucket
fs.ls('https://data.aodn.org.au/imos-data')

I'm no expert in the s3fs package or using S3 but this fails for me?

mhidas commented 1 year ago

@Thomas-Moore-Creative You're correct, it's a public S3 bucket, and its name is just imos-data (https://data.aodn.org.au/imos-data is a web front-end to it). s3fs only needs the bucket name (and optional prefix). So try this:

fs.ls('imos-data')
fs.ls('imos-data/IMOS/ANMN')
# etc...
Thomas-Moore-Creative commented 1 year ago

@Thomas-Moore-Creative ... So try this:

Thanks for helping me with those basics, @mhidas!

mhidas commented 1 year ago

:+1: No worries.

Something worth noting is that there are often multiple data products in there based on the same original observations. I'm not so familiar with the other IMOS facilities, but for the moorings there are generally at least 4-5 levels of product:

You can learn more about the last three products here.

Thomas-Moore-Creative commented 5 months ago

2024 AODN hackathon

Where we'd like to be - are we already there?

  1. a data pipeline for ocean observations that works with python tools on any laptop, supercomputer, or cloud instance.
  2. a searchable catalog of all data that can be filtered by variable, platform/instrument type, time, space, WOD inclusion, other metadata.
  3. returning a lazy object with rich metadata - provenance, QC, in/out WOD
  4. returning a lazy object in xarray? pandas dask dataframe? parquet?

my_data = get_aodn(variable='temperature',time=(2005-01,2018-12),latitude=(-40,-20),longitude=(140,180),WOD='False')

Goals: