INTERMAGNET / wg-www-gins-data-formats

Repository to track working group discussions for WWW/Gins/Data Formats
2 stars 1 forks source link

Implementation of Seedlink at Edinburgh GIN #13

Open SimonFlower opened 1 year ago

SimonFlower commented 1 year ago

This issue originated in issue #6

The Edinburgh GIN will implement a Seedlink client to allow it to receive real-time geomagnetic data from other GINs and Intermagnet observatories. The client will be written in C. In this phase of work, data will only be received through Seedlink. In the future it may be interesting to consider being able to forward data to other users (including outside Intermagnet).

The seedlink client at the Edinburgh GIN will:

To load data into the GIN's data store the following metadata must be retrieved from the Seedlink data packets:

I'm using this as a starting point for documentation: https://seedlink.readthedocs.io/en/latest/protocol.html

Questions:

Resources

Example command line for slarchive: For the server, it would be “rtserve.iris.washington.edu”. So running the container with the following extra command should work:

docker … -v [your  SDS archive location]:/data [container image] -S “NT_*” rtserve.iris.washington.edu
bgeels-USGS commented 1 year ago

I created a couple diagrams to clarify the possible data flow layouts and to mirror the diagram that Simon posted in #12. We haven't discussed the alternate layout that I show below but I've included it in case there's any pushback from members on the proposed layout.

Proposed Data Flow

Intermagnet SeedLink Overview 1

Each GIN hosts their data on ringservers (or other SeedLink servers) that BGS can connect to. BGS pulls from these servers via SeedLink.

Alternate Data Flow

Intermagnet SeedLink Overview 2

BGS hosts a ringserver in a DMZ. Each GIN connects to this server and pushes their data to it via DataLink.

Note: Arrow directions show who reaches out to whom to establish the connection. Some IT groups may care about this more than the flow direction of data since receiving connection requests requires having your server in a DMZ or on a LAN with firewall exceptions.

CharlesBlais commented 1 year ago

Thanks @bgeels-USGS for the diagram. Just to add that DataLink is a TCP handshake protocol documented here: https://seiscode.iris.washington.edu/svn/orb2ringserver/tags/release-1.0/libdali/doc/DataLink.protocol. I am not quite sure if there is a premade library or client for it but I could ask IRIS but Canada does have its own code in python (its simple handshake). To answer your questions @SimonFlower

Answer: The sample I gave to you, correct, no need, its a pull.

Answer: There isn't one publicly available yet but if you test first with IRIS with NT, I could work towards making it public for BGS only. We don't want to advertise it on this open platform.

Answer: On the IRIS, you get get the USGS NT network (https://fdsn.org/networks/detail/NT/) and for ours, once open to BGS, you could get the Canada C2 network (https://fdsn.org/networks/detail/C2/). Both are registered networks under FDSN.

Answer: Miniseed packets are identified with a SNCL which a Station Network Channel Location code. It's documented for geomag here: https://github.com/INTERMAGNET/miniseed-sncl. So for example, a miniseed packet identified wit NT.BOU.R0,UFX would be network NT, iaga code BOU, R0 = raw internet (or variation), U = minute, F = magnetometer, X = X component.

Answer: That is hard question to answer for real-time. For existing FDSN network, probably not. My concern is renaming real-time stream but also the DOI some use for the network. Canada doesn't use it but USGS probably does (https://fdsn.org/networks/detail/NT/). New miniseed format is expanding to allow more network codes (beyond 2) which would allow all networks to register and use their own code. If there is a wider interest of registering on FDSN, I could follow up with the seismic community of considering the possibility of including our networks.

Answer: It might, the SNCL document was only a draft with two institutes inputs. Miniseed offers different high compression types and I think the US and Canada use two different methods (float32 vs float64). In the end, it doesn't matter much since most tools can still read those. It's just size in the end.

Answer: Miniseed has a new version coming out eventually. I don't know when exactly but its meant to improve multiple aspects including digital signature and much more. At the moment, since geomag records in pT a float format is probably best and records in nT. Float32 or float64 works fine.

bgeels-USGS commented 1 year ago

I am not quite sure if there is a premade library or client for it but I could ask IRIS but Canada does have its own code in python (its simple handshake).

Iris maintains libdali but the Canadian python implementation seems more useful, and as Charles mentioned the Datalink protocol is simple.

  • What Seedlink server should I use to collect data from the Golden GIN? What Seedlink version does this server use?

Currently we stream our raw observatory data (channel codes BEU, BEV, BEW, LFF, LK1-LK5) to IRIS/FDSN via a ringserver hosted at edgecwb.usgs.gov:18001. We would likely use the same ringserver or another ringserver instance on the same server (w/ different port) to provide our realtime and QD/D feeds to BGS.

  • Should / can we define a Seedlink network name for Intermagnet? If we do, would this allow a receiving client to easily filter out all Intermagnet data from a Seedlink server?

Answer: That is hard question to answer for real-time. For existing FDSN network, probably not. My concern is renaming real-time stream but also the DOI some use for the network. Canada doesn't use it but USGS probably does (https://fdsn.org/networks/detail/NT/). New miniseed format is expanding to allow more network codes (beyond 2) which would allow all networks to register and use their own code. If there is a wider interest of registering on FDSN, I could follow up with the seismic community of considering the possibility of including our networks.

It might be easier if different network codes are used by each institute so that the data can be quickly differentiated when using standard Miniseed libraries and tools. If Intermagnet decides to publish data on the FDSN network in the future, then perhaps a single FDSN-approved network code could be used for that purpose? Should we make a new issue for this in the miniseed-scnl repo?

  • Is it neccessary to define further conventions for how Seedlink is implemented in Geomagnetism?

Answer: It might, the SNCL document was only a draft with two institutes inputs. Miniseed offers different high compression types and I think the US and Canada use two different methods (float32 vs float64). In the end, it doesn't matter much since most tools can still read those. It's just size in the end.

  • Assuming the Seedlink data packets used by Geomag institutes are formatted as MiniSeed records, which MiniSeed version do they use?

Answer: Miniseed has a new version coming out eventually. I don't know when exactly but its meant to improve multiple aspects including digital signature and much more. At the moment, since geomag records in pT a float format is probably best and records in nT. Float32 or float64 works fine.

It may be beneficial to provide some geomag-centric guidance on using the Miniseed header fields. Comments on usage of some of the fields (e.g.: activity, I/O, and quality flags) could be helpful even if the comment just says "it doesn't matter". I can paste a table of the assumptions and defaults we use for these fields if that would be helpful. I agree that the data portion of a Miniseed record is pretty straightforward and that most tools appear to be handle variations automatically. Usage of the Seedlink protocol is also fairly straightforward.

CharlesBlais commented 1 year ago

Also, forgot to add also that IRIS (or Earthscope) also supports a slink2dali

https://github.com/iris-edu/slink2dali

Which can be used to bridge two seedlink server. So @SimonFlower, this could be used also for real-time retransmission to clients with BGS as the hub and main SeedLink server.

SimonFlower commented 1 year ago

Thanks both for the information - very useful. I've got a C implementation of a Seedlink client working and I've been able to retrieve USGS ("NT") packets from the IRIS server rtserve.iris.washington.edu:18000. I can decode the data OK. I'm not sure what to do next though. Brendan, if I understand correctly, you're providing:

10Hz variometer data in UVW orientation 1Hz total field data 1Hz temperature data

Is that right? In Intermagnet we're interested in 1Hz and 1-minute data in a standard orientation (XYZ, HDZ or DIF) with at least a provisional baseline added. I imagine the conversion from the 10Hz data to 1-second data requires some knowledge of the set up at your observatories, so is not something we could do (and unless we change policy in Intermagnet, we shouldn't do it either - Intermagnet has always always avoided processing observatories' data).

So if I want to feed data from these servers into the Edinburgh GIN, I'd need to get you to provide other channels with 1Hz data in a standard orientation and baseline corrected, wouldn't I?

I thought that you were already exchanging data between the US and Canada using Seedlink? If that's the case, how are you managing the conversion from observatory variation data to standard orientation / baseline corrected?

CharlesBlais commented 1 year ago

Canada, for its core infrastructure, doesn't use USGS data yet. NOAA does use our data using our open FDSNWS which also offers miniSeed data.

https://geomag.nrcan.gc.ca/fdsnws/

We offer 1 minute and 1HZ data through channels UF (minute) and LF (second) and some stations we also offer 8Hz also.

SimonFlower commented 1 year ago

You're making 1-second and 1-minute data available through Seedlink? What orientation are you presenting your UF and LF data in?

CharlesBlais commented 1 year ago

Our public available SeedLink isn't yet in our operational chain (so many other fires to handle) but will be in the coming months to come where BGS could then use it. Our public FDSN is.

In theory, all these channels would be available

https://geomag.nrcan.gc.ca/fdsnws/station/1/query?network=C2&format=text&level=channel&starttime=2023-06-16

Some non-INTERMAGNET in it but yes, 1 second and 1 minute will be available. X,Y,Z,F in our case.

SimonFlower commented 1 year ago

Thanks Charles. That looks like a comprehensive list (and of course we can filter out the non-Intermagnet data). It looks like, for the present, there aren't any channels that I can receive that are suitable for entry into the Edinburgh GIN - do you both agree? If so, I'll stop working on this for now and concentrate on the MQTT side of real-time transfer. If you can let me know when those channels from NRCan are available, I'll pick this up again then. It looks to me like this will make a nice system!

Brendan - are there similar plans for USGS (to make baseline adjusted data available on Seedlink)?

bgeels-USGS commented 1 year ago

Simon, I agree. We aren't producing 'XYZ' data in miniseed yet, only 'UVWF' data. A big deadline is coming up for our development team so we probably won't get around to implementing this at least for another couple of weeks.

SimonFlower commented 1 year ago

OK - I'll put the code in a state where I can come back to it and wait to hear from either or both of you when you have data available. In the meantime I'll start looking a MQTT...