FDSN Request: provide metadata for stations even when they do not have complete metadata.

calum-chamberlain commented 2 years ago

When downloading station metadata from the FDSN webservice stations that do not have information at "channel" or "response" level have no information returned for them with level=channel, but they do return (basic) information when requests are made with level=station. In the case of station WTSZ the query: https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text returns nothing, but the query: https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=station&format=text returns the station location.

It would be helpful (to me) if the basic station information was returned for all stations, regardless of whether their metadata are complete. I appreciate that this may not be what everyone wants, so if there is good reason not to do this, or if this goes against FDSN protocol then I'm fine with it not changing, but wanted to at least post this somewhere so that others might find this before thinking that there were fewer stations active at a given time.

In my case, I query the station service to work out what stations are active, then look up the waveforms for those stations. I think that this is common practice (and is done by the Obspy FDSN massdownloader) so it might help to provide all the metadata that are available, even if those metadata are incomplete.

mnaguit commented 2 years ago

Hi @calum-chamberlain Thanks for posting this query. Indeed, it is helpful for other end users working on the same scenario to receive more info on this. We will check the metadata for WTSZ (or other stations of the same case) and will provide further details soon.

FYI @salichon

calum-chamberlain commented 2 years ago

WTSZ may not be the best station to worry about due to #101 - but it would be worth checking which stations are missing in this query (channel level): https://service.geonet.org.nz/fdsnws/station/1/query?station=*&level=channel&format=text vs this query (station level): https://service.geonet.org.nz/fdsnws/station/1/query?station=*&level=station&format=text

I noticed this particularly for stations that have a starttime before their earliest channel starttime, but when data are available.

salichon commented 2 years ago

Hello @calum-chamberlain! thanks @mnaguit

1 about incomplete/partial information provided by the FDSN station service this service to me is the front-end of the Delta metadata GeoNet public repository (https://github.com/GeoNet/delta)

The building of the FDSN station service relies on process that are building the xml information from the delta repo and compliant to the stationxml format. As a consequence any partial or missing bit of information will enable a certain level of service through the service to none. (that what we always check at the end over our instrument change procedure since 2016/2017)

so to your idea "It would be helpful (to me) if the basic station information was returned for all stations, regardless of whether their metadata are complete" is possible yeap without too much of a pain on the delta git repo but Not over a downstream service with standards. -to my knowledge-

2 WTSZ as refered into https://github.com/GeoNet/help/issues/101 was entered as temporary instrumentation at the time 2015 and its metadata not maintained as for a National/regional permanent station: responsabilties got lost and information forgotten over time provided its temporary and non GeoNet status. For instance WTSZ is/was seismic site never closed (end most likely 2017) and instrument response never actually finalised I have been working on that family of mount/instrumentation to at least distribute this borehole instrument response. This needs to be bound to the datalogger existing a s a combo (Reftek) i think

So solutions exist. :) though require some proper work to be adequate and durable. This is a great if we can progress on that legacy area of temporary and exp stations.
3 Long story cut short about WTSZ: https://github.com/GeoNet/delta/search?q=WTSZ+whataroa
4 Delta is patiently crafted and updated to follow up with instrumentation built, service, new instrumentations, tools along with some optimizations.
(and besides the routine network maintenance bits)
5 .. :) about https://github.com/GeoNet/help/issues/103#issuecomment-1183762798 This is some sizeable work implemented to list, track, investigate and resolve as in https://github.com/GeoNet/tickets/issues/10052 (Edit) The solving for these is non unique (might quite specific) and relies on legacy, metadata checking and of course inspection of the data themselves when they exist. The more we go back in time the more the investigation can be intricate ..! or sometimes will stay partial

..... As a conclusion (sorry for the length )

Would you detail in that ticket too what would be according to you the minimum basic information required ?

cheers regards

FYI @JonoHanson @ozym @staylorofford

calum-chamberlain commented 2 years ago

Thanks Jerome, I don't follow all of that, and I think that the WTSZ/Whataroa things are for a different issue.

Just to be clear, I'm not asking for all the metadata to be complete - I get that for the legacy data that is likely impossible. What I would like is for a request like: https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text

where there are no channel or response level metadata to return the station information that is available, e.g.:

#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ|||-43.302000|170.412000|100.000000||||||||2014-06-23T00:00:05|

and similar for the station xml: For example it would be good (in my mind) if the following two calls returned the same inventory for stations without channel level metadata:

from obspy.clients.fdsn import Client

client = Client("GEONET")
kwargs = dict(
    station="WTSZ", network="NZ")
inv = client.get_stations(level="station", **kwargs)

inv_channel = client.get_stations(level="channel", **kwargs)

assert {sta.code for net in inv for sta in net} == {sta.code for net in inv_channel for sta in net}

salichon commented 2 years ago

@CallumNZ Okay got it -
In that case it smore related to the specs of the stationxml service, and what "https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text"_ returns as a minimum- info over a query_ ..right right ... instead of a blank or else. ...
Something to check with the sys dev team that is managing the web parts of the service ...If i got it clear ;)

btw : I had a closer look to WTSZ tmp for that case it is also missing the equivallent of the naming of the streams in delta .
So it seems legit that is not provided. ( hence blocking part of the (incomplete) information.) can be modified on that case. (until further work)

salichon commented 2 years ago

Hello @sue-h-gns @junghao

FDSN station service query output Question

Does the fdsn station service query mechanism allow for returning information of higher level when the lower level query is "empty" ?
eg. : level channel (low/more detailed) "https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text vs
level station (high/less detailed) ""https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=station&format=text"

Is it limited by the specs of the service ?

thank you cheers!

calum-chamberlain commented 2 years ago

@salichon I think you are tagging the wrong Calum.

junghao commented 2 years ago

The FDSN specification didn't mention about this situation. FDSN spec only defines HTTP status 204 as "Request was properly formatted and submitted but no data matches the selection".

I met this dilemma while developing the station service - when there's no channel, should we respond 204 frankly, or to respond higher level of metadata? I chose the former. The idea was to let the client (supposed to be some application) knows there's nothing there, instead of giving providing a response and let it do the parsing then figure out the truth.

Also, not sure if all clients (still, applications) can cope with empty channel names when requesting level=channel.

However, since it's not defined well, so we can discuss what would benefit users most.

calum-chamberlain commented 2 years ago

Thanks for that @junghao - my understanding (from the seismology side rather than how the data are handled in the back) is that I would expect that if some data (e.g. station information) but not all (e.g. no channel information) data were available, then those data would be returned. I get that that is not how "data" is defined for the backend, but with the heirachical structure of station-xml it makes sense (in my opinion) to revert to returning all metadata that match <= level requested (e.g. level=channel should return channel and station).

But that is just one biased opinion. I don't know how other organisations handle this, or what other seismologists think of this. I imagine @SquirrelKnight might have an opinion on this, as might others. Happy to ask around for opinions if it would help?

ozym commented 2 years ago

The problem I see is that the "Channel" level of the stationxml schema http://docs.fdsn.org/projects/stationxml/en/latest/reference.html#channel has the Code as required. When there is no code available this level can't be formed.

So the question is, does a service requesting data at the channel level get back what it would get if asking for the station level, or nothing (as is the case now).

Sort of saying give me everything down to the channel level and those that don't have channel info, then just match to the station level etc.

But, this then has implications for wild-carding. Do you return station information that doesn't have channels that match the wildcards or do you skip those stations?

Or do you treat the case of no wildcards given as a special case.

calum-chamberlain commented 2 years ago

Good point on wildcarding - in my opinion the current option (not returning metadata for stations that do not have channel metadata even though there are data that might match the requested channel) is worse than returning station metadata that may not match the requested channel. That (biased) opinion is based on having ignored relevant stations in my research because I did not know that they were missing channel metadata and were not included in the stationxml because of this.

junghao commented 2 years ago

When query for channel level, the output fields should be regarding channels' information, thus the expected output would be ambiguous:

#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ|||-43.302000|170.412000|100.000000||||||||2014-06-23T00:00:05|

The field latitude/longitude/time is supposed to reflecting the channel's metadata, not the station. When there's no channel, they should be empty.

So if we're going to respond with common metadata, then the output would be

#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ||||||||||||||

calum-chamberlain commented 2 years ago

Good point @junghao - I mostly care about the stationxml returned rather than the text output, which should contain the station location. Nevertheless, returning just the network and station for text would be helpful. It might help users who use the text output to fill the fields that are unknown with "unknown"? Although that would be a clear change that would affect other things and might break other peoples code/work.

calum-chamberlain commented 2 years ago

In this issue it was pointed out that the preferred response for empty meatdata is suggested in the FDSN spec:

In cases where the response is unknown, for example really old channels, or where a response is not applicable, like textual log channels, it is preferred that an empty response element be used, , to positively indicate that no response exists.

salichon commented 2 years ago

@junghao @ozym what do you think about this fdsn spec ^^ suggestion <response\><response> or else empty when not applicable (it might have to be limited to very specific elements such as the response one)

ozym commented 2 years ago

That's fine for stationxml but it won't help when the text format is being used (as discussed above).

ozym commented 2 years ago

There is still something odd with the input/output of stationxml I think it's the requirement that there be at least one stream attached to the site.

calum-chamberlain commented 2 years ago

Yes channel is required, and channel requires latitude, longitude, elevation and depth, but each of these attributes can also be empty if they are unknown, which I assume is the issue here?

Agreed that it won't help with the text output, but this should just be consistent with the stationxml format (so empty everything except network code and station code as @junghao suggested?).

ozym commented 2 years ago

The problem is that the channel needs a code, i.e. "HHZ" etc. which is the bit missing, we generally know all the rest. So this issue will not be so much about the response, but knowing what was recorded.

ozym commented 2 years ago

But in the way the system is written, even if the code is given it will then lookup a response and skip the channel if it can't find one. So this is likely to be an area which can be improved now.

calum-chamberlain commented 2 years ago

The problem is that the channel needs a code, i.e. "HHZ" etc. which is the bit missing, we generally know all the rest.

But don't you have this information in the waveforms, along with the location code? Apologies if I'm missing something else there and being naive!

ozym commented 2 years ago

The issue is that they are disconnected. There is no list of waveforms, just a list of sensors, a list of dataloggers, and a list of times. They need to be joined together to essentially predict what the channel codes will be, this is where the "response" element comes in. It says something like "a broadband sensor will have 3 components called Z N E or whatever), then there will be something else that says this instrument records a 100 Hz stream, which has a sensor attached to it and because it's a broadband it will be called HH . So this makes up the HHZ etc. However, if there's a bit missing (due to not knowing the sensor or datalogger types) then the join doesn't happen and it looks like there's no channel available.

ozym commented 2 years ago

I think in some ways the hold up may be more along the lines, of "we don't know the full response so we're not going to even start the process" rather than saying, we know enough to at least determine what the code will be and just give an empty response (as suggested above).

ozym commented 2 years ago

I've been slowly working on a rewrite of the backend code, this scenario will be much easier to handle as in the current system there are some hidden assumptions and logic steps.

salichon commented 1 year ago

[x] (update) Some Work in Progress ...

junghao commented 1 year ago

The update has deployed.

$ curl "https://service.geonet.org.nz/fdsnws/station/1/query?station=WTSZ&level=channel&format=text"
#Network | Station | Location | Channel | Latitude | Longitude | Elevation | Depth | Azimuth | Dip | SensorDescription | Scale | ScaleFreq | ScaleUnits | SampleRate | StartTime | EndTime
NZ|WTSZ||||||

elidana commented 10 months ago

Hi @calum-chamberlain , this should have been fixed back in March when we applied some improvements to the StationXML service. I'm closing this, but please reopen if you are still having issues!

GeoNet / help

FDSN Request: provide metadata for stations even when they do not have complete metadata. #103