EIDA / userfeedback

This repository is meant to collect feedback from EIDA users by means of its Issue Tracker
11 stars 5 forks source link

[RoutingClient] bug/malfunction? creating mass requests when asking for non-existing data #56

Closed flofux closed 4 years ago

flofux commented 4 years ago

Hi everyone,

Petr Kolinsky and I found some very strange behaviour using the ObsPy Routing Client. Basically, when asking data that does not exist the routingclient asks for too much data on all nodes, the result being that the script is stuck and the memory of the computer runs full.

We tested it a bit on two different Linux machines and a Windows 8.1 machine and with ObsPy versions 1.1.1 and 1.2.1, the result is always the same. Notably, this issue only appeared like 2 weeks ago. Before everything was working fine.

Here's how to reproduce it (the network/station does not matter, as long as it's data that is not existing, I chose an example where I knew data was not existing):

from obspy.clients.fdsn import RoutingClient
from obspy import UTCDateTime
fdsn = RoutingClient('eida-routing', debug=True)
t1 = UTCDateTime('2016-01-01')
t2 = UTCDateTime('2016-01-02')
fdsn.get_waveforms(network='OE', station='CONA', channel='HHZ', location='*', starttime=t1, endtime=t2) # i know data is existing
fdsn.get_waveforms(network='OE', station='UNNA', channel='HHZ', location='*', starttime=t1, endtime=t2) # i know data is NOT existing

On both our machines the first request runs just fine. The second request results in unstoppable memory increase and nothing being returned. From the debug=True output it appears as if the RoutingClient is ALWAYS asking for ALL data ... even though it's specified otherwise. At first I thought it was an ObsPy bug, but it occured from one day to the other, without our ObsPy installations changed. It's also persistent amont different ObsPy versions ... so I finally suspect it's something on EIDA side!?

Please let me know if anyone is able to reproduce this ...

javiquinte commented 4 years ago

Hi @flofux Step by step queries to the Routing Service seem to return proper values:

https://www.orfeus-eu.org/eidaws/routing/1/query?net=OE&sta=CONA&channel=HHZ&format=post&start=2016-01-01&end=2016-01-02

http://www.orfeus-eu.org/fdsnws/dataselect/1/query
OE CONA * HHZ 2016-01-01T00:00:00 2016-01-02T00:00:00

https://www.orfeus-eu.org/eidaws/routing/1/query?net=OE&sta=UNNA&channel=HHZ&format=post&start=2016-01-01&end=2016-01-02

http://www.orfeus-eu.org/fdsnws/dataselect/1/query
OE UNNA * HHZ 2016-01-01T00:00:00 2016-01-02T00:00:00

I'll check the Obspy client and post here the results.

javiquinte commented 4 years ago

In Obspy, with station CONA:

>>> stw = fdsn.get_waveforms(network='OE', station='CONA', channel='HHZ', location='*', starttime=t1, endtime=t2)
>>> print(stw)
1 Trace(s) in Stream:
OE.CONA..HHZ | 2015-12-31T23:59:54.188400Z - 2016-01-02T00:00:06.488400Z | 100.0 Hz, 8641231 samples

and with station UNNA:

>>> stw = fdsn.get_waveforms(network='OE', station='UNNA', channel='HHZ', location='*', starttime=t1, endtime=t2)
...
Downloading http://www.orfeus-eu.org/fdsnws/station/1/query with requesting gzip compression
Sending along the following payload:
----------------------------------------------------------------------
format=text
level=channel
OE UNNA * HHZ 2016-01-01T00:00:00 2016-01-02T00:00:00
----------------------------------------------------------------------
Downloaded http://www.orfeus-eu.org/fdsnws/station/1/query with HTTP code: 204

Above you can see that it requests all streams matching OE.UNNA.*.HHZ and that the answer is that there are no channels matching that (Error 204). After that Obspy sends the following query (and here is the problem!)

Downloading http://www.orfeus-eu.org/eidaws/routing/1/query ...
Sending along the following payload:
----------------------------------------------------------------------
service=dataselect
format=post
----------------------------------------------------------------------

This query above asks for all available streams in EIDA. No idea why Obspy is requesting this, but this seems to be the cause for the experienced behaviour.

jschaeff commented 4 years ago

Well spotted Javier ! We also got accross this problem with a user requesting data from INGV and ending with a request for all data on all EIDA nodes. Our guess was that Obspy somewhere does a datalect instead of a station request.

I did'nt have time to submit a bug on Obspy, but I guess we should do it now ?

On my side, to reproduce the problem :

from obspy.clients.fdsn import RoutingClient
token_path = '.eidatoken'
client = RoutingClient("eida-routing", credentials={'EIDA_TOKEN':
token_path},debug=True)
inv  = client.get_waveforms(network='IV', station='APEC',
location='*',channel='BHZ', starttime='2019-05-16', endtime='2019-05-17')
PetrColinSky commented 4 years ago

@javiquinte, Thanks for looking at this. I first experienced the issue on 8. April. I cannot say, when exactly this happened, but in mid-March, it was working OK. I was always using the same script, the same list of stations and asking for the same dates. The same Python/Obspy version. I was repeatedly running the same procedure, because every now and then, I got some more data from EIDA. And suddenly, it started to behave like this. This is, why we got the idea, that maybe something was changed on EIDA, because we are not aware of any change at our side. cheers petr

megies commented 4 years ago

@flofux @javiquinte i can confirm this is on our end and the fix is almost done, will push a PR in a minute

javiquinte commented 4 years ago

Thanks for the fix @megies !