Open Thomas-Moore-Creative opened 1 year ago
Thanks @Thomas-Moore-Creative - good quesions, and a great use case for us to know about!
As I mentioned in our meeting today, in the short term you might find it helpful to query our catalogue of all moorings data files (from both the National Moorings Network and the Deep Water Moorings facilities of IMOS). Here's some info I copied from a Wiki. Unfortunately it's on a private Wiki, and it's a bit outdated, but mostly still relevant. I'll try to update the essential info...
Moorings facility operators often want to know details of the data files they have provided. A common question is "Which files have I already uploaded?". There is an easy way to find answers to these questions directly, by accessing AODN web services (more specifically a Web Feature Service).
A web service is a system that accepts requests and returns results over the Web (via HTTP). A request can be typed directly into the address bar of a browser, or given as an argument to a command-line tool (like curl
). Often requests are generated by other software that interacts with the service.
As an example, web services make it possible to find, preview and download data using the AODN Portal. Behind the scenes, the portal combines three services that it talks to via the web:
These services can also be accessed directly, at
In this context a "feature" is a spatial entity (e.g. a point or line) with a set of attributes attached to it (the data). Think of it as a row in a table, where each column is one of the attributes, and one of the columns is the "geometry" specifying the spatial extent of the feature (in the horizontal plane only).
We have set up a WFS called imos:moorings_all_map
which allows you to obtain metadata about all currently public data files from the IMOS moorings facilities (National Mooring Network, and Deep Water Moorings). Each feature/row refers to a single file and provides the following details:
wget
or curl
).LATITUDE
variable)LONGITUDE
variable)standard_name
variable attributes names in the file (where applicable)long_name
variable attributes names in the fileThese are boolean properties to allow easier filtering on the presence of certain types of parameter in the file:
An additional column (FID, always first) is added by the server and can simply be ignored. Refer to the IMOS NetCDF Conventions for the meaning of the global attributes harvested.
You can download the entire table in comma-separated-values format (CSV - can be opened in e.g. Excel) by pasting the following request into your browser's address bar. (I have broken it up so it's a bit easier to see what the request is made up of, but you have to put it all on one line, with no spaces):
http://geoserver-123.aodn.org.au/geoserver/imos/ows?
service=WFS&
version=1.0.0&
request=GetFeature&
typeName=imos:moorings_all_map&
outputFormat=csv
To save you copy/pasting, here is a direct link to the same request. However, this will tell you everything about all ~40,000 files (download size about 22Mb), which is probably a lot more than you're interested in. Instead, you can apply filters to the table.
For example, to get the list of files for the Palm Passage mooring (GBRPPS) uploaded since the start of the year, add a cql_filter
like this (only the last line is new):
http://geoserver-123.aodn.org.au/geoserver/imos/ows?
service=WFS&
version=1.0.0&
request=GetFeature&
typeName=imos:moorings_all_map&
outputFormat=csv&
cql_filter=date_published AFTER 2018-01-01T00:00:00 AND site_code='GBRPPS'
Again, combine the request into one line, with no spaces between the arguments. Since the filter itself needs to contain spaces, they can be replaced with the code '%20'. Or just click (or copy & edit) this link. Now you'll only get the lines you're interested in.
You can also select which columns (properties) you're interested in by adding a propertyName
argument, and the downloaded file will only include those columns. E.g. if you only want the file path, deployment code and instrument details for all delayed-mode files:
http://geoserver-123.aodn.org.au/geoserver/imos/ows?
service=WFS&
version=1.0.0&
request=GetFeature&
typeName=imos:moorings_all_map&
outputFormat=csv&
cql_filter=realtime = FALSE&
propertyName=url,deployment_code,instrument,instrument_serial_number,instrument_nominal_depth
Of course after a while it would become quite tedious typing these long requests into your browser, so better two get a program to do it. Here are a couple of examples of WFS access from a Python script:
See the GeoServer WFS documentation for more details and advanced features.
Here's a much more recent example where I'm using the same WFS (actually a subset of it) to query some metadata related to mooring configurations: https://github.com/aodn/aodn-public-notebooks/blob/d9ee9785221a7d75dbf58371a02db6aaa6ff2687/moorings/common.py#L16
And similar issues apply elsewhere in the world no doubt (I know for a fact that NZ has a lot of data that hadn't made it into their public repositories, colleagues at MetOcean Solutions chased a lot of it up, but unsure if they're able to make it available to us... MetOcena also has T profiles from the Mangōpare sensors developed & deployed during the Moana project which they can't share publicly due to various arrangements with the fisheries industry).
In that light it might be worthwhile thinking about CARS v2.0 not just as a product in the form of a 'static' atlas but as a code base to allow other groups to create regional versions of the atlas with data that might not be generally available.
@mhidas - ignorant question. Given https://data.aodn.org.au/imos-data
is an S3 bucket
should we be able to access it via s3fs
? For example with NOAA S3 buckets I'm used to being able to do something like the following:
import s3fs
# Initialize s3 client with s3fs
fs = s3fs.S3FileSystem(anon=True)
# list contents of bucket
fs.ls('https://data.aodn.org.au/imos-data')
I'm no expert in the s3fs
package or using S3
but this fails for me?
@Thomas-Moore-Creative You're correct, it's a public S3 bucket, and its name is just imos-data
(https://data.aodn.org.au/imos-data
is a web front-end to it).
s3fs
only needs the bucket name (and optional prefix). So try this:
fs.ls('imos-data')
fs.ls('imos-data/IMOS/ANMN')
# etc...
@Thomas-Moore-Creative ... So try this:
Thanks for helping me with those basics, @mhidas!
:+1: No worries.
Something worth noting is that there are often multiple data products in there based on the same original observations. I'm not so familiar with the other IMOS facilities, but for the moorings there are generally at least 4-5 levels of product:
file_version
set to 0).file_version
set to 1). Each file holds just one deployment's worth of data from one instrument, so there are usually hundreds of them per mooring site.You can learn more about the last three products here.
xarray
? pandas
dask dataframe? parquet
?my_data = get_aodn(variable='temperature',time=(2005-01,2018-12),latitude=(-40,-20),longitude=(140,180),WOD='False')
s3
?)
Work done by @ChrisC28 & @BecCowley has identified gaps in even the most common baseline ocean observations ( temperature & salinity ) over all time for large areas of the Australian EEZ / region in state-of-the-art global databases like the WOD.
For example this preliminary plot from @ChrisC28 shows the lack of any observations in the WOD for much of the Australian shelf & coast over all time. And regions with coverage may still suffer from seasonal aliasing?
But are we missing local Australian data holdings not in WOD?
Important considerations for using the AODN could include: