INTERMAGNET / wg-www-gins-data-formats

Repository to track working group discussions for WWW/Gins/Data Formats
2 stars 1 forks source link

A Coverage JSON format for INTERMAGNET #7

Open SimonFlower opened 5 years ago

SimonFlower commented 5 years ago

I’d like to start discussing an implementation of Coverage JSON (https://covjson.org/) for Geomagnetic Data. The Coverage JSON “Point Series” defines most of what we need, but I think there’s still a need to define the type of metadata we need for Geomagnetic observatory data. I’d propose that, once agreed, we publish this as an INTERMAGNET data format – does that make sense?

Attached is an example of the CoverageJSON we’re producing from the Edinburgh GIN web service. I think there’s a duplication of the latitude and longitude in this example (it’s defined as part of the Point Series, then again in the “geomag:info” metadata). I wonder if we can take out the latitude and longitude from the “geomag:info” section? Are there any other thoughts or comments? Shall I start work on creating a data format description along these lines?

AAA20190717.json.txt

CharlesBlais commented 4 years ago

Regarding the coverage JSON, I've looked at your example and the timestamps are quite repetitive. My concern is that this message would result in much larger file in high sampling rate formats. Personally, geomagnetic community needs to start looking into low bandwith, high throughput formats for it to be competitive with much other formats. Canada, in its case, has taken a simple approach of just having a start time. Note, the example below follows FDSN naming convention but that is because we use that as our background infrastructure.

[{"data":[18037.650390625,18037.650390625,18037.650390625,18037.669921875,18037.689453125,18037.630859375,18037.630859375,18037.609375,18037.5703125,18037.5703125,18037.58984375],"header":{"calib":1.0,"channel":"LFX","delta":1.0,"location":"R0","network":"C2","sampling_rate":1.0,"starttime":"2020-07-06T00:00:00Z","station":"OTT"}},{"data":[-4288.47998046875,-4288.5,-4288.52001953125,-4288.509765625,-4288.43994140625,-4288.43017578125,-4288.43017578125,-4288.41015625,-4288.3798828125,-4288.35009765625,-4288.18017578125],"header":{"calib":1.0,"channel":"LFY","delta":1.0,"location":"R0","network":"C2","sampling_rate":1.0,"starttime":"2020-07-06T00:00:00Z","station":"OTT"}},{"data":[50725.48828125,50725.5,50725.48828125,50725.5,50725.51953125,50725.4609375,50725.44921875,50725.44921875,50725.44921875,50725.4296875,50725.4296875],"header":{"calib":1.0,"channel":"LFZ","delta":1.0,"location":"R0","network":"C2","sampling_rate":1.0,"starttime":"2020-07-06T00:00:00Z","station":"OTT"}}]

SimonFlower commented 4 years ago

I'm open to other JSON schemas, but I think we need something that conforms to standards (if such a thing exists for time-series data). GeoJSON is a great example of the sort of thing I think we need, as it's usable in all sorts of tools and products, but unfortunately I don't think it can be used to represent time-series data.

In the EPOS project (similar to EarthCube in the US) we looked at what was available in JSON for time-series representation about a couple of years ago. CovJSON was the schema we decided to go with. CovJSON is aiming for W3C and OGC standardisation (https://www.w3.org/TR/covjson-overview/), though it is at present much less mature than standards like GeoJSON. There are already a few tools to work with, in particular the playground, which shows well how using standards allows tools to be unaware of the knowledge domain that the data represents, yet still be able to visualize data in ways that are effective for people with domain expertise: https://covjson.org

The issue of file size could be addressed through compression - this type of data will compress very effectively.

CharlesBlais commented 4 years ago

Ya, we looked into CovJSON when we were collaborating with BoM in Australia for weather maps (in this case space weather maps). We didn't quite understand why doing this over GeoJSON especially considering certain databases implement GeoJSON queries (https://docs.mongodb.com/manual/geospatial-queries/).

We didn't use it for time series because the processing time in most languages to convert a string to a date/time object (depending of the language) was a large overhead for high sampling rates and UNIX timestamp is problem for data prior to 1970. In our case, we were doing for web browsers (JavaScript) and it's faster to take a starttime with a delta.

jmfee-usgs commented 4 years ago

I agree the timestamps should not be listed explicitly, and think it should be possible to use the start, end, and num options to define the time range (instead of values) and require users to calculate the offsets: https://www.w3.org/TR/covjson-overview/#x2.2-encoding-the-domain

It appears the javascript library we use with leaflet expects the values to be numeric, so we could use floating point epoch time or other similar numeric representation for interpolation to work as expected. The spec doesn't appear to impose this data type restriction. https://github.com/Reading-eScience-Centre/covjson-reader/blob/master/src/Coverage.js#L847

SimonFlower commented 4 years ago

Thanks Jeremy, that's useful. We already have a CovJSON payload available from the Edinburgh GIN (but currently using an array of dates). We could add an alternative CovJSON payload that uses start, end and num as you suggest. Before doing this, it would be worth working up a piece of example data by hand and trying it out in the CovJSON playground. is this a sensible way forward?

Having looked around I don't see any alternatives to CovJSON if we want a standards based solution (and even with CovJSON it's not sure that it will complete acceptance as a standard). Am I right, or have other people seen alternatives?

jmfee-usgs commented 4 years ago

Attached is a modified file that appears to display similarly in the playground.

      "t": {
        "start": 1563321600000,
        "stop": 1563407940000,
        "num": 1440
      }

AAA20190717_domain.json.txt

I used javascript to convert the dates using:

new Date("2019-07-17T00:00:00.000Z").getTime()
new Date("2019-07-17T23:59:00.000Z").getTime()

This seems to work with the existing covjson-reader for time axes, since they are converted back to dates using new Date(value)... Other languages could also efficiently calculate values using these millisecond epoch timestamps, as long as the values are documented.

I'm a little concerned that only the first parameter (S) seems to display in the playground, and would also like to consider whether the "geomag:info" section could be represented using covjson standards.

CharlesBlais commented 4 years ago

Any concern about data before 1970-01-01 and before (since its a UNIX timestamp)? That isn't a problem for INTERMAGNET since its 1990 and beyond.

jmfee-usgs commented 4 years ago

Negative numbers and 64 bit integers solve the epoch problems, as long as systems expect these values.

SimonFlower commented 4 years ago

Jeremy - the issue with only a single component of data displaying in the playground - I think that's a bug in the playground. It's certainly not a problem to present multiple channels of data in CovJSON - we're doing this in the EPOS project across Europe.

I would prefer to use dates rather than epoch times. Is Leaflet the main barrier to this?

jmfee-usgs commented 4 years ago

Hi Simon,

Agreed, the iso8601 datetime format is much easier to understand.

It's mainly in this project, which is used by leaflet coverage and unfortunately appears to be unmaintained since 2016. The transformDomain function converts the domain type axes definition to an array of values, but currently only works with numeric values: https://github.com/Reading-eScience-Centre/covjson-reader/blob/master/src/Coverage.js#L810

CharlesBlais commented 3 years ago

As an items for this meeting, could people take a look at the sample service designed by BGS at https://imag-data-staging.bgs.ac.uk/GIN_V1/ and look at the CovJSON output. I guess the question out of it do people agree that this format is added as an INTERMAGNET standard for distribution? Luckily, BGS has already done the work.

I've just played with it myself. JSON made it easy to read. Not a fan of the python library https://github.com/Reading-eScience-Centre/pycovjson but didn't care much since its JSON afterall.

A few questions using one of the files grabbed from the website:

SimonFlower commented 3 years ago