SeismicData / pyasdf

Python Interface to ASDF based on ObsPy
http://seismicdata.github.io/pyasdf/
BSD 3-Clause "New" or "Revised" License
52 stars 30 forks source link

Why the differences between input station XML file to ASDF and the output station XML file #63

Closed zhang01GA closed 4 years ago

zhang01GA commented 4 years ago

Using the latest PyASDF=0.7.1, I created a new ASDF file, add my station xml file into it, then extract the stationxml file.

The output station XML file is found to be significant different then the original input. Please see the attached sample files below: asdf_input_OA_CF28_inventory.xml.txt asdf_output_OA_CF28_inventory.xml.txt

Note that the asdf_input_OA_CF28_inventory.xml can be read by obspy and get the extra metadata as documented in https://docs.obspy.org/tutorial/code_snippets/stationxml_custom_tags.html

However, the output xml file asdf_output_OA_CF28_inventory.xml cannot get the extra metadata

from obspy import read_inventory

# our_new_station_xml ="asdf_input_OA_CF28_inventory.xml"  # add this stationxml into asdf file
# VS extracted from ASDF
our_new_station_xml ="asdf_output_OA_CF28_inventory.xml"

our_inv = read_inventory(our_new_station_xml,format='STATIONXML')

stn_meta = our_inv.networks[0].stations
print(type(stn_meta[0].extra))
print(stn_meta[0].extra)

Error: AttributeError: 'Station' object has no attribute 'extra'
krischer commented 4 years ago

I cannot reproduce this behavior. For me it works as expected as demonstrated in the following code snippet. Can you change the code snippet so it fails for you?

import obspy
import pyasdf

print(pyasdf.__version__)

filename = "OA.CF28.xml"

# Read directly.
inv_original = obspy.read_inventory(filename)

# Write to ASDF file, and read again.
asdf_file = "test.h5"
with pyasdf.ASDFDataSet(asdf_file) as ds:
    ds.add_stationxml(filename)
with pyasdf.ASDFDataSet(asdf_file) as ds:
    inv_new = ds.waveforms["OA.CF28"].StationXML

assert (
    inv_original.networks[0].stations[0].extra
    == inv_new.networks[0].stations[0].extra
)

print(inv_new.networks[0].stations[0].extra)

assert inv_original == inv_new
zhang01GA commented 4 years ago

I just noticed your comments today. I will test use your scripts and comeback to you later. Thank you.

zhang01GA commented 4 years ago

You have to write the inventory to a new xml file, then read it with obspy.

Below is my code. Compare the output file with the input, you will see.

import obspy
import pyasdf

print(pyasdf.__version__)

filename = "OA.CF28.xml"

# Read directly.
inv_original = obspy.read_inventory(filename)

# Write to ASDF file, and read again.
asdf_file = "test.h5"
with pyasdf.ASDFDataSet(asdf_file) as ds:
    ds.add_stationxml(filename)

with pyasdf.ASDFDataSet(asdf_file) as ds:
    inv_new = ds.waveforms["OA.CF28"].StationXML

    inv_new.write('OA.CF28_new.xml', format='STATIONXML')

assert inv_original == inv_new

assert (
    inv_original.networks[0].stations[0].extra
    == inv_new.networks[0].stations[0].extra
)

print(inv_original.networks[0].stations[0].extra)
print(inv_new.networks[0].stations[0].extra)

# Now, try to read the file 'OA.CF28_new.xml' and get the extra
inv_new2=  obspy.read_inventory('OA.CF28_new.xml')

print(inv_new2.networks[0].stations[0].extra)

Output in my computer was:

$ python krisher_script63.py 0.7.1 AttribDict({'metadata_ga': AttribDict({'namespace': 'https://github.com/GeoscienceAustralia/hiperseis/xmlns/1.0', 'value': '\n{\n "network_code":"OA",\n "station_code":"CF28",\n\n "orient_correction": {\n "start_dt": "2017-11-07T09:07:34.930000Z",\n "end_dt": "2018-08-23T03:52:29.528000Z",\n "azimuth_correction": -5.0\n },\n\n "gps_clock_corrections": [\n {\n "date": "2018-01-04",\n "seconds": -1.3375814425470127\n },\n {\n "date": "2018-01-05",\n "seconds": -1.110449564656099\n },\n {\n "date": "2018-01-06",\n "seconds": -0.9032476255118933\n }\n ]\n\n}\n'})}) AttribDict({'metadata_ga': AttribDict({'namespace': 'https://github.com/GeoscienceAustralia/hiperseis/xmlns/1.0', 'value': '\n{\n "network_code":"OA",\n "station_code":"CF28",\n\n "orient_correction": {\n "start_dt": "2017-11-07T09:07:34.930000Z",\n "end_dt": "2018-08-23T03:52:29.528000Z",\n "azimuth_correction": -5.0\n },\n\n "gps_clock_corrections": [\n {\n "date": "2018-01-04",\n "seconds": -1.3375814425470127\n },\n {\n "date": "2018-01-05",\n "seconds": -1.110449564656099\n },\n {\n "date": "2018-01-06",\n "seconds": -0.9032476255118933\n }\n ]\n\n}\n'})}) Traceback (most recent call last): File "krisher_script63.py", line 37, in print(inv_new2.networks[0].stations[0].extra) AttributeError: 'Station' object has no attribute 'extra'

zhang01GA commented 4 years ago

@krischer It appears that the root cause lies in the obspy inventory write() method: Upon writing, the xml file with the extra metadata has been changed - no longer the same as the original input.

krischer commented 4 years ago

Hi,

yes - you'll have to add the nsmap argument to the inventory.write() call - see the tutorial here: https://docs.obspy.org/tutorial/code_snippets/stationxml_custom_tags.html - otherwise they won't be written.

pyasdf does the same as StationXML files are also internally serialized as a byteblob StationXML.

I'm closing this because it is no longer a pyasdf related issue - the extra namespaces survive the roundtrip to ASDF and back which is really all we guarantee on this side. If you think ObsPy should handle this differently please open an issue in the ObsPy repository.

Please reopen if I misinterpreted something.