EIDA / userfeedback

This repository is meant to collect feedback from EIDA users by means of its Issue Tracker
11 stars 5 forks source link

[WFCatalog] data incomplete for Z3 stations at ETH node? #9

Closed flofux closed 3 years ago

flofux commented 5 years ago

Hi everyone, during my tests I saw that WFCatalog entries for Z3 stations at the ETH node seem to be slightly incomplete. I tested all days in 2016, 2017 and 2018 (LHZ channel only) and usually I can download more days of data than are available due to the WFCatalog info. Usually few days (less than ten within one year) are missing in the WFCatalog. I did not check carefully in detail which days/stations are affected. But the stations that showed this behaviour include:

A050A, A051A, A052A, A061A, A062A, A252A, A255A, A271A, A272A, A273A, A280A, A281A, A282A, A285A, A287A, A288A.

It's not a big deal though, as the data is anyways accessible. Maybe the WFCatalog needs an update?

sheimers commented 5 years ago

I think the problem is we do not automatically reprocess WFcatalog for data that is older than one week. We do it daily for the past day and weekly for the past week. My guess: If a station is offline for more than a week and then we manage to reconnect and fill the gap, WFcatalog does not notice.

I'll try to reprocess all past data, but I am afraid this will take very long since it will have to read our whole waveform archive which is huge. The documentation says the --update option will compare the checksums of all files which involves reading them completely. Or is it possible to tell it to look at the file modification time rather the md5 sum? That would make it much faster.

javiquinte commented 5 years ago

Or is it possible to tell it to look at the file modification time rather the md5 sum? That would make it much faster.

Probably @Jollyfant could give details on that. But in any case, would it be possible for you @sheimers to modify the weekly update script to go through the files modified in the last week? Because once you reprocess the missing data you will have the same situation as soon as a station is inactive for onw week. Thanks in advance!

Jollyfant commented 5 years ago

Date modified is unfortunately not stored in the WFCatalog and the hash is more reliable for detecting changes. It's actually not that slow to do an update for a full year in my experience, and I've done it before many times.

sheimers commented 5 years ago

@javiquinte I already do add new and update modified files for the past week once a week like this:

WFCatalogCollector.py --past week --update --csegs --flags I'll update this now to process the last two weeks (--past fortnight) which will make it less likely to skip some data, but can't completely avoid it. Else I am afraid I have to reprocess the whole archive from time to time in order to catch up with some waveforms that were added late.

@Jollyfant Processing the whole archive is really slow here. We currently have 66TB of miniseed, and it's remote on an NFS server. The server is not under our control and shared with other institutes, so we can't run wfcatalog on the local disk.

javiquinte commented 5 years ago

I actually meant to run a "find" command to get all the files modified since a particular date. The resulting list could be somehow used as input for the WFCatalogCollector Just as an idea...

sheimers commented 5 years ago

Thanks, that's a good idea.

Jollyfant commented 5 years ago

For reprocessing I usually give it a --dir for what I know has new data. Usually that reduces the number of files processed from EVERYTHING to less.

okling commented 4 years ago

But data is not accessible for me...

Here's the example code with Python 3.7 and Obspy 1.1.1:

from obspy import UTCDateTime
from obspy.clients.fdsn import RoutingClient
starttime = UTCDateTime(2018,8,15,21,56,0)
endtime   = UTCDateTime(2018,8,15,21,59,0)
routed_token = RoutingClient('eida-routing', debug=True, credentials={'EIDA_TOKEN': '/home/aling/eidatoken'})

st = routed_token.get_waveforms(network='Z3', station='A050A', location='*', channel='HHZ', starttime=starttime, endtime=endtime)

This is what I've got:

Downloading http://www.orfeus-eu.org/eidaws/routing/1/query ...
Sending along the following payload:
----------------------------------------------------------------------
service=station
format=post
alternative=false
Z3 A050A * HHZ 2018-08-15T21:56:00.000000 2018-08-15T21:59:00.000000
----------------------------------------------------------------------
Installed new opener with handlers: [<obspy.clients.fdsn.client.CustomRedirectHandler object at 0x7f76f7a08a90>]
Base URL: http://eida.ethz.ch
Request Headers: {'User-Agent': 'ObsPy/1.1.1 (Linux-4.4.0-137-generic-x86_64-with-debian-stretch-sid, Python 3.7.3)'}
Downloading http://eida.ethz.ch/fdsnws/dataselect/1/application.wadl with requesting gzip compression
Downloading http://eida.ethz.ch/fdsnws/event/1/application.wadl with requesting gzip compression
Downloading http://eida.ethz.ch/fdsnws/station/1/application.wadl with requesting gzip compression
Downloading http://eida.ethz.ch/fdsnws/event/1/catalogs with requesting gzip compression
Downloading http://eida.ethz.ch/fdsnws/event/1/contributors with requesting gzip compression
Uncompressing gzipped response for http://eida.ethz.ch/fdsnws/dataselect/1/application.wadl
Downloaded http://eida.ethz.ch/fdsnws/dataselect/1/application.wadl with HTTP code: 200
HTTP error 404, reason Not Found, while downloading 'http://eida.ethz.ch/fdsnws/event/1/contributors': b'Error 404: Not Found\n\nThe requested resource does not exist on this server.\n\nUsage details are available from /fdsnws/event/1/\n\nRequest:\n/fdsnws/event/1/contributors\n\nRequest Submitted:\n2019-12-16T13:35:38.834414\n\nService Version:\n1.2.0\n'
HTTP error 404, reason Not Found, while downloading 'http://eida.ethz.ch/fdsnws/event/1/application.wadl': b'Error 404: Not Found\n\nThe requested resource does not exist on this server.\n\nUsage details are available from /fdsnws/event/1/\n\nRequest:\n/fdsnws/event/1/application.wadl\n\nRequest Submitted:\n2019-12-16T13:35:38.836785\n\nService Version:\n1.2.0\n'
HTTP error 404, reason Not Found, while downloading 'http://eida.ethz.ch/fdsnws/event/1/catalogs': b'Error 404: Not Found\n\nThe requested resource does not exist on this server.\n\nUsage details are available from /fdsnws/event/1/\n\nRequest:\n/fdsnws/event/1/catalogs\n\nRequest Submitted:\n2019-12-16T13:35:38.843276\n\nService Version:\n1.2.0\n'
Uncompressing gzipped response for http://eida.ethz.ch/fdsnws/station/1/application.wadl
Downloaded http://eida.ethz.ch/fdsnws/station/1/application.wadl with HTTP code: 200
Discovered dataselect service
Discovered station service
Storing discovered services in cache.
Downloading https://eida.ethz.ch/fdsnws/dataselect/1/auth with requesting gzip compression
Sending along the following payload:
----------------------------------------------------------------------
BEGIN PGP MESSAGE
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

{"valid_until": "2019-12-27T21:37:35.338200Z", "cn": "Angel Ling", "memberof": "/epos/alparray;/epos;/", "sn": "Ling", "issued": "2019-11-27T21:37:35.338207Z", "mail": "angel.ling@erdw.ethz.ch", "givenName": "Angel", "expiration": "1m"}
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJd3uyfAAoJEEFpzp0AlwdXxtIIAIJTnhljK6IpFeYj9hO+w1NW
pBob3OZprcfFDBYypS3rrSo7VmqhaQDYgkEy8xjJPVhRo+HHIIs/6zgY5UXlb94d
CbbH3DSrD0iZXl+kolte1ZRwv9e5Gp398YdQFvuX34v86kxJIeD7P+XLQp07jhEJ
NZJisf1CyjlWMwdRnmdlQgRwI5mD854qJjVajJi6cpqvGCGdjASuaQWsx0w9vp3m
jGvK0y8mDuPmVZPSHUoQpo7Q4hnPfSZ8e0pAAHBFpvv5rdIywAa5aKL6X/sKr2y2
TIdp3OGghj2GlwbDpfFr/CWBVUfiF1BfQQ587Xf82PO3kmyrQ8pwmufuIEZRfKw=
=F+eB
-----END PGP SIGNATURE-----

----------------------------------------------------------------------
Error while downloading: https://eida.ethz.ch/fdsnws/dataselect/1/auth
/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py:99: UserWarning: Failed to download data of type 'station' from 'http://eida.ethz.ch' due to: 
Traceback (most recent call last):
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 94, in _try_download_bulk
    return _download_bulk(r)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 122, in _download_bulk
    c.set_eida_token(r["credentials"]["EIDA_TOKEN"])
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/client.py", line 312, in set_eida_token
    user, password = self._resolve_eida_token(token)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/client.py", line 374, in _resolve_eida_token
    use_gzip=True, return_string=True)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/client.py", line 1383, in _download
    raise_on_error(code, data)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/client.py", line 1736, in raise_on_error
    (str(data.__class__.__name__), str(data))))
obspy.clients.fdsn.header.FDSNException: Unknown Error (URLError): <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1056)>

  r["data_type"], r["endpoint"], reason))
Downloading http://www.orfeus-eu.org/eidaws/routing/1/query ...
Sending along the following payload:
----------------------------------------------------------------------
service=dataselect
format=post
----------------------------------------------------------------------
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "</home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/decorator.py:decorator-gen-57>", line 2, in get_waveforms
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 82, in _assert_filename_not_in_kwargs
    return f(*args, **kwargs)
  File "</home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/decorator.py:decorator-gen-56>", line 2, in get_waveforms
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 89, in _assert_attach_response_not_in_kwargs
    return f(*args, **kwargs)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 334, in get_waveforms
    return self.get_waveforms_bulk([bulk], **kwargs)
  File "</home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/decorator.py:decorator-gen-60>", line 2, in get_waveforms_bulk
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 82, in _assert_filename_not_in_kwargs
    return f(*args, **kwargs)
  File "</home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/decorator.py:decorator-gen-59>", line 2, in get_waveforms_bulk
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 89, in _assert_attach_response_not_in_kwargs
    return f(*args, **kwargs)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/eidaws_routing_client.py", line 117, in get_waveforms_bulk
    r = self._download(self._url + "/query", data=bulk_str)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/base.py", line 245, in _download
    self._handle_requests_http_error(r)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/routing/routing_client.py", line 303, in _handle_requests_http_error
    raise_on_error(r.status_code, buf)
  File "/home/aling/anaconda3/envs/obspy/lib/python3.7/site-packages/obspy/clients/fdsn/client.py", line 1707, in raise_on_error
    server_info)
obspy.clients.fdsn.header.FDSNNoDataException: No data available for request.
Detailed response of server:

No Content -- 
jschaeff commented 3 years ago

Closing this old idle issue. Reopen if needed.