EIDA / mediatorws

EIDA NG Mediator/Federator web services
GNU General Public License v3.0
6 stars 6 forks source link

fed: Invalid mSEED returned #1

Closed Jollyfant closed 7 years ago

Jollyfant commented 7 years ago

Hi guys, I was trying to federator requesting all LHZ channels from networks FR and IV. Individually it works fine but when I ask for both at the same time the service response is unpredictable (I'm making the same request multiple times):

Making request 3
nBytes: 1260032

419 Trace(s) in Stream:

FR.AJAC.00.LHZ | 2017-01-01T00:00:00.590339Z - 2017-01-01T00:59:59.590339Z | 1.0 Hz, 3600 samples
...
(417 other traces)
...
IV.ZCCA..LHZ | 2017-01-01T00:01:10.120000Z - 2017-01-01T01:02:59.120000Z | 1.0 Hz, 3710 samples

[Use "print(Stream.__str__(extended=True))" to print all Traces]

Request OK

====

Making request 4
nBytes: 1260032

/Users/Mathijs/Documents/GitHub/obspy/obspy/io/mseed/core.py:413: InternalMSEEDReadingWarning: readMSEEDBuffer(): Record starting at offset 294912 is not valid SEED. The rest of the file will not be read.
  warnings.warn(*_i)
28 Trace(s) in Stream:

FR.AJAC.00.LHZ | 2017-01-01T00:00:00.590339Z - 2017-01-01T00:59:59.590339Z | 1.0 Hz, 3600 samples
...
(26 other traces)
...
FR.PAND.00.LHZ | 2017-01-01T00:00:00.945659Z - 2017-01-01T00:00:20.945659Z | 1.0 Hz, 21 samples

[Use "print(Stream.__str__(extended=True))" to print all Traces]

Request OK

===

Making request 5
nBytes: 1260032
Traceback (most recent call last):
  File "request.py", line 28, in <module>
    print read(io.BytesIO(r.content))
  File "<decorator-gen-31>", line 2, in read
  File "/Users/Mathijs/Documents/GitHub/obspy/obspy/core/util/decorator.py", line 294, in _map_example_filename
    return func(*args, **kwargs)
  File "/Users/Mathijs/Documents/GitHub/obspy/obspy/core/stream.py", line 210, in read
    stream = _read(pathname_or_url, format, headonly, **kwargs)
  File "<decorator-gen-32>", line 2, in _read
  File "/Users/Mathijs/Documents/GitHub/obspy/obspy/core/util/decorator.py", line 144, in uncompress_file
    return func(filename, *args, **kwargs)
  File "/Users/Mathijs/Documents/GitHub/obspy/obspy/core/stream.py", line 273, in _read
    headonly=headonly, **kwargs)
  File "/Users/Mathijs/Documents/GitHub/obspy/obspy/core/util/base.py", line 466, in _read_from_plugin
    list_obj = read_format(filename, **kwargs)
  File "/Users/Mathijs/Documents/GitHub/obspy/obspy/io/mseed/core.py", line 412, in _read_mseed
    raise _i
obspy.io.mseed.InternalMSEEDReadingError: FR_CHMF_00_LHZ_M: Impossible Steim2 dnib=00 for nibble=10

I think somewhere in the concatenation of the mSEED from different sources there is a problem.

Best, Mathijs

Jollyfant commented 7 years ago

Here is my script, fyi:

import requests
import datetime
import io

from obspy import read

MEDIATOR_URL = "http://mediator-devel.ethz.ch/fdsnws/dataselect/1/query?"

QUERY = "&".join([
  "channel=LHZ",
  "net=IV,FR",
  "start=2017-01-01T00:00:00",
  "end=2017-01-01T01:00:00"
])

for i in range(5):

  print "Making request %d" % i

  r = requests.get(MEDIATOR_URL + QUERY)

  print "nBytes: %d" % len(r.content)

  print read(io.BytesIO(r.content))

  if r.status_code == 200:
    print "Request OK"
  else:
    print "Error request code: %d" % r.status_code
feuchner commented 7 years ago

Hi Mathijs, thanks for letting us know!

feuchner commented 7 years ago

Fixed in ac8d30821b328539f87e9e55d54bf2cbdc236c57. This was a tricky one! It turned out that when reading from the mseed streams, the buffer size cannot be constant, but has to be the real record size. Otherwise incomplete records can be written to the output mseed file.

Jollyfant commented 7 years ago

Oh yeah. The flexibility of mSEED can sometimes be useful and tricky at the same time. Nice work!

Jollyfant commented 7 years ago

I'm not too deep in the code but can you please explain why it is necessary to read the records anyway? If I am not mistaken, the federator could merge mSEED from different sources by just concatenating the individual service responses one after the other, as long as they are complete when written.

feuchner commented 7 years ago

Good question. Balancing between memory footprint and thread waiting time? Just relying on Andres' design here...

ltrani commented 7 years ago

Hi guys sorry to jump in here but this gives me quite some concerns... I would definitely avoid unnecessary I/O operations as they could create serious bottlenecks. I fully understand and support reuse of software whenever possible provided this does not put strict constraints on our infrastructure. In this case I see a critical behavior that could affect the overall performance. In my view this is a point that requires further investigation and possibly improvements. Please don't forget that our final aim is to have a solid and stable system and the mediator is a crucial component.

kaestli commented 7 years ago

Sure, Luca, i can confirm that we share the aim of a solid and stable system, and that we are committed to tackle issues where they manifest.