SeismicData / pyasdf

Python Interface to ASDF based on ObsPy
http://seismicdata.github.io/pyasdf/
BSD 3-Clause "New" or "Revised" License
53 stars 30 forks source link

Issues related to traceback print #30

Closed wjlei1990 closed 8 years ago

wjlei1990 commented 8 years ago

Hi Lion,

I found there is a issue related to the traceback print.

First, our asdf data contains a file like I mentioned here: https://github.com/obspy/obspy/issues/1371

Pyasdf first gives out a error like this:

Error during the processing of station 'XA.SA72' and tag 'raw_observed' on rank 4:
Traceback (At max 3 levels - most recent call last):
  File "/autofs/nccs-svm1_home1/lei/software/pyasdf/pyasdf/asdf_data_set.py", line 1708, in process
    traceback_limit=traceback_limit)
  File "/autofs/nccs-svm1_home1/lei/software/pyasdf/pyasdf/asdf_data_set.py", line 1725, in _dispatch_processing_mpi
    traceback_limit=traceback_limit)
  File "/autofs/nccs-svm1_home1/lei/software/pyasdf/pyasdf/asdf_data_set.py", line 1907, in _dispatch_processing_mpi_worker_node
    stream = process_function(stream, inv)
  File "/autofs/nccs-svm1_home1/lei/software/pypaw/src/pypaw/process.py", line 28, in process_wrapper
    return process(stream, inventory=inv, **param)
  File "/autofs/nccs-svm1_home1/lei/software/pytomo3d/pytomo3d/signal/process.py", line 228, in process
    st.detrend("linear")
  File "/ccs/home/lei/anaconda2/lib/python2.7/site-packages/obspy-0.10.2-py2.7-linux-x86_64.egg/obspy/core/util/decorator.py", line 241, in new_func
    return func(*args, **kwargs)
  File "/ccs/home/lei/anaconda2/lib/python2.7/site-packages/obspy-0.10.2-py2.7-linux-x86_64.egg/obspy/core/stream.py", line 2304, in detrend
    tr.detrend(type=type)
  File "/ccs/home/lei/anaconda2/lib/python2.7/site-packages/obspy-0.10.2-py2.7-linux-x86_64.egg/obspy/core/util/decorator.py", line 258, in new_func
    return func(*args, **kwargs)
  File "/ccs/home/lei/anaconda2/lib/python2.7/site-packages/obspy-0.10.2-py2.7-linux-x86_64.egg/obspy/core/util/decorator.py", line 241, in new_func
    return func(*args, **kwargs)
  File "/ccs/home/lei/anaconda2/lib/python2.7/site-packages/obspy-0.10.2-py2.7-linux-x86_64.egg/obspy/core/trace.py", line 231, in new_func
    result = func(*args, **kwargs)
  File "/ccs/home/lei/anaconda2/lib/python2.7/site-packages/obspy-0.10.2-py2.7-linux-x86_64.egg/obspy/core/trace.py", line 1817, in detrend
    self.data = func(self.data, **options)
  File "/ccs/home/lei/anaconda2/lib/python2.7/site-packages/scipy/signal/signaltools.py", line 1900, in detrend
    newdata = newdata.astype(dtype)

ValueError: could not convert string to float: 

It is because the data array is string, but not float. Then this error should be catched here: https://github.com/SeismicData/pyasdf/blob/master/pyasdf/asdf_data_set.py#L1751

However, when executing this line, there is an error coming out:

tb += "".join(exc_line)

The error log is:

Traceback (most recent call last):
  File "process_asdf.py", line 19, in <module>
    proc.smart_run()
  File "/autofs/nccs-svm1_home1/lei/software/pypaw/src/pypaw/procbase.py", line 201, in smart_run
    self._core(path, param)
  File "/autofs/nccs-svm1_home1/lei/software/pypaw/src/pypaw/process.py", line 85, in _core
    ds.process(process_function, output_asdf, tag_map=tag_map)
  File "/autofs/nccs-svm1_home1/lei/software/pyasdf/pyasdf/asdf_data_set.py", line 1708, in process
    traceback_limit=traceback_limit)
  File "/autofs/nccs-svm1_home1/lei/software/pyasdf/pyasdf/asdf_data_set.py", line 1725, in _dispatch_processing_mpi
    traceback_limit=traceback_limit)
  File "/autofs/nccs-svm1_home1/lei/software/pyasdf/pyasdf/asdf_data_set.py", line 1928, in _dispatch_processing_mpi_worker_node
    tb += "".join(exc_line)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 47: ordinal not in range(128)
krischer commented 8 years ago

Haha :-) Oh well...yea i did not test that. I'll have a look.

What actually happens when you store the above mentioned data in an ASDF file? Is it stored as single bytes? As mentioned in the obspy issue: you should really fix the file - whatever ends up in the ASDF file is likely garbage.

wjlei1990 commented 8 years ago

In asdf file, it stores as byte:

In [4]: ds.waveforms.XA_SA71.raw_observed[0].data
Out[4]: 
array(['\x0c', '\r', '\x05', ..., '\x07', '\n', '\n'], 
      dtype='|S1')

My current plan for this is: just throw those file away during processing...So if pyasdf now can catch the error and run through, I feel OK.

krischer commented 8 years ago

You can probably fix them with this tool: https://seiscode.iris.washington.edu/projects/msmod

Just change the encoding to whatever it really is - likely STEIM1 or STEIM2.

krischer commented 8 years ago

I'll make pyasdf raise an exception if one tries to add non-numeric arrays - that does not really make sense for waveforms in ASDF - the auxiliary data should be used for that.

wjlei1990 commented 8 years ago

I agree.

So two things?

  1. Check the data type for the array during conversion.
  2. fix the UnicodeDecodeError in error handling.
wjlei1990 commented 8 years ago

I strip the code out to form a simple test, but couldn't really reproduce the error.

I have attached my code and data here. test_read.py.txt XA.SA72.mseed.txt

Not sure if I am doing the right thing...

krischer commented 8 years ago

The encoding issue should be fixed - can you check? You will likely no longer be able to test once the check for the data type is implemented.

wjlei1990 commented 8 years ago

I am going to test it now.

wjlei1990 commented 8 years ago

You may forget to delete this line. https://github.com/SeismicData/pyasdf/blob/master/pyasdf/asdf_data_set.py#L1772

Otherwise, it is fixed :)

Thanks!

krischer commented 8 years ago

Nice catch.

krischer commented 8 years ago

Alright - should be all fixed. You are no longer allowed to use a datatype that is not valid ASDF. Is will raise a TypeError if you attempt to do it.