GeoNet / help

An issues repo for technical help questions.
6 stars 3 forks source link

FDSN on the go-slow #55

Closed calum-chamberlain closed 5 years ago

calum-chamberlain commented 5 years ago

From yesterday on I have noticed that downloading GeoNet waveform data from the FDSN service is very slow, on multiple computers. This is limiting all sorts of things, including running CI tests on travis, circleci and appveyor, as well as downloads on local machines. As a test this morning I compared GeoNet FDSN times to IRIS, running in ipython using the %timeit magics:

from obspy import UTCDateTime
from obspy.clients.fdsn import Client

iris_client = Client("IRIS")
geonet_client = Client("GEONET")

t1 = UTCDateTime(2019, 1, 1)
t2 = t1 + 120  # two minutes of data.
%timeit iris_client.get_waveforms(network="NZ", station="RPZ", channel="HH?", location="10", starttime=t1, endtime=t2)
# output: 1.43 s ± 101 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit geonet_client.get_waveforms(network="NZ", station="RPZ", channel="HH?", location="10", starttime=t1, endtime=t2)
# output: 1min 2s ± 20.6 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

Timings come from one of the VUW servers with a good network connection, but I get very similar times from home in Lower Hutt.

Any idea why the FDSN is running so slow (nearly 50x slower than IRIS for the same data), and can it please be sped up?

The data downloaded are almost the same (although IRIS actually cuts the data and gives you the requested start and end times), with the Geonet FDSN providing about a 1,000 additional samples across the three channels compared to IRIS.

salichon commented 5 years ago

Hi Calum Well it seems variable indeed but i dunno why Today i performed quickly this

time curl "https://service.geonet.org.nz/fdsnws/dataselect/1/query?network=NZ&station=RPZ&location=10&channel=HHZ&starttime=2019-01-01T00:00:00.000&endtime=2019-01-01T00:02:00.000" -o test -->> 15.5 seconds

time curl "https://service.iris.edu/fdsnws/dataselect/1/query?network=NZ&station=RPZ&location=10&channel=HHZ&starttime=2019-01-01T00:00:00.000&endtime=2019-01-01T02:00:00.000" -o test1 -->> 7 seconds

calum-chamberlain commented 5 years ago

Hi Jerry,

Strange, when I run the curl it takes 36s. I ran the same tests as above but without the Python to avoid any strange ObsPy things and get essentially the same result:

time curl "https://service.geonet.org.nz/fdsnws/dataselect/1/query?network=NZ&station=RPZ&location=10&channel=HH?&starttime=2019-01-01T00:00:00.000&endtime=2019-01-01T00:02:00.000" -o test
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 71168    0 71168    0     0    743      0 --:--:--  0:01:35 --:--:--   220

real    1m35.770s
user    0m0.016s
sys 0m0.040s

And for IRIS:

time curl "https://service.iris.edu/fdsnws/dataselect/1/query?network=NZ&station=RPZ&location=10&channel=HH?&starttime=2019-01-01T00:00:00.000&endtime=2019-01-01T00:02:00.000" -o test
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 71168    0 71168    0     0  29335      0 --:--:--  0:00:02 --:--:-- 29335

real    0m2.447s
user    0m0.034s
sys 0m0.017s
calum-chamberlain commented 5 years ago

I might note that the reason I noticed this is that some of my code that analyses two days of data after the Darfield mainshock has been running now for two days, when it last week it ran in less than 4 hours. The code is slightly different, but the real time-sink is downloading the data...

salichon commented 5 years ago

@calum-chamberlain hey! I have done the 3 RPZ components request in addition and then it takes much longer for Geonet ... ~100 seconds instead of ~15 seconds at IRIS ... So yeah it s a bit too slow ... We ll try to figure out what s up :) (or someone to explain me :) )

calum-chamberlain commented 5 years ago

Thanks!

salichon commented 5 years ago

FYI @nbalfour

salichon commented 5 years ago

Hi @calum-chamberlain apparently there was a peak in the requests last week. Is there an increased usage of the data service on your side ? Cheers thanks jerome

calum-chamberlain commented 5 years ago

Hey Jerome, I didn't actively increase my usage last week, I will ask around here.

calum-chamberlain commented 5 years ago

I think we have found the cause and better ways to do things will be found for that person. Thanks so much for digging into this @salichon and @nbalfour.