Open jbienkowski opened 3 years ago
@jbienkowski, StationXML
metadata validation may be optionally enabled/disabled. So it depends on the user of the software if there waveform catalog metadata is served or not. If the validation configuration changes, data needs to be reprocessed.
Question: Is reprocessing a crucial issue. I mean performance-wise? The workflow could be something like:
Note, that this change requires adjusting the collector's delete facilities, too.
Calculate the metrics anyways, but apply filtering on the web service side
I'm not sure if I would implement this approach. Imagine a request which queries the entire waveform metadata inventory. Then, filtering becomes costly. I'm aware that there are service level configuration parameters such as https://github.com/EIDA/wfcatalog/blob/93f3d7bc4a1219f12da0bb4ae1ad45920a99fbcf/service/configuration.json#L12-L14 and https://github.com/EIDA/wfcatalog/blob/93f3d7bc4a1219f12da0bb4ae1ad45920a99fbcf/service/configuration.json#L19 available.
@jbienkowski, have you seen already this https://github.com/EIDA/wfcatalog/blob/93f3d7bc4a1219f12da0bb4ae1ad45920a99fbcf/collector/config.json#L20-L22 configuration option which enables file based filtering while collecting?
See also https://github.com/EIDA/wfcatalog/blob/93f3d7bc4a1219f12da0bb4ae1ad45920a99fbcf/collector/WFCatalogCollector.py#L475-L512 and https://github.com/EIDA/wfcatalog/blob/93f3d7bc4a1219f12da0bb4ae1ad45920a99fbcf/collector/WFCatalogCollector.py#L279-L299
We originally decided not to include just the channels in the metadata because a lot of nodes wanted to have their full archive processed, and not just what is exposed through FDSNWS. I guess updating the white list is too much manual labor.. It is probably better to add another option and add the FDSNWS response [net.sta.loc.cha] to a hashmap and do a lookup on whether to skip or not.
I would be in favor of processing everything, and filtering the output. This is the same strategy as for data management.
This way, as soon as the metadata is available, wfcatalog can spit all the information out and there is no need to start looking for all the data to index each time some metadata is submitted . Of course, it should be implemented without making a fdsnws-station call for every wfcatalog request.
Of course, it should be implemented without making a fdsnws-station call for every wfcatalog request.
@jschaeff, I get your points. However, this approach implies:
I often hit the wall of ignoring when the metadata changes. We miss a datestamp on the stationXML format, that would be usefull in a lot of usecases. Or there could be an RSS feed service provided by all EIDA nodes and publishing metadata changes. Or a websocket system. But this is a bit off topic, although it would help wfcatalog keeping track of metadata changes.
Could wfcatalog manage a cache of the StationXML metadata for each network he knows about (or just the part he needs) ? The cache can be refreshed at arbitrary frequency or manualy.
Basicaly, it's about storing a dictionary NSLC:boolean and the°+°
Unfortunately, that's not enough. The restrictedStatus
references a ChannelEpoch
such that the startDate
and endDate
needs to be part of the dict
key. However, this epoch information might change, too. Besides, it is not strictly defined how the restrictedStatus
attribute property is inherited to child nodes.
We miss a datestamp on the stationXML format, that would be usefull in a lot of usecases.
Versioning most probably requires more than just a simple time stamp.
Could wfcatalog manage a cache of the StationXML metadata for each network he knows about (or just the part he needs) ? The cache can be refreshed at arbitrary frequency or manualy.
OT: Interestingly, not caching StationXML metadata was a requirement when designing eidaws-federator
. So, why should it be possible when implementing fdsnws-availability
based on the eidaws-wfcatalog
backend?
Basicaly, it's about storing a dictionary NSLC:boolean and the°+°
Unfortunately, that's not enough. The
restrictedStatus
references aChannelEpoch
such that thestartDate
andendDate
needs to be part of thedict
key. However, this epoch information might change, too. Besides, it is not strictly defined how therestrictedStatus
attribute property is inherited to child nodes.
Sorry, github sent my comment with some keyboard shortcut I hit ... Yes the restriction is valid for a timeperiod. Good point.
Currently, WFCatalog does not depend on station metadata - it calculates metrics for acquired data even if some channels are not defined in StationXML. In those cases users can retrieve the metrics, but are not able to download the data itself via FDSNWS-Dataselect web service which strongly depends on metadata.
Possible solutions:
metadata
query parameter with default valuetrue
in the WFCatalog implementation which would still allow retrieval of all available metrics