SeismicSource / sourcespec

Earthquake source parameters from P- and S-wave displacement spectra
https://sourcespec.seismicsource.org
Other
49 stars 12 forks source link

New Spectrum() and SpectrumStream() classes #50

Closed claudiodsf closed 3 months ago

claudiodsf commented 3 months ago

For an ongoing project, I need to save spectra to the output directory.

This motivated me to tackle the technical debt which was accumulating in the Spectrum() class.

This is the first class I wrote 12 years ago (!) and at that time I decided that a quick and dirty –yet functional– solution was to make it inherit from ObsPy Trace() object. One of the limits of this approach was that there wasn't any easy option (except of using pickle) to save spectra to disk.

So, I rewrote the Spectrum() class from scratch, while trying to keep the same interface of the previous one, with one difference:

The main improvements are the following:

I thoroughly tested these new classes with my default examples and with many other events: the results are consistent, with some small (~1%) differences arising sometimes, probably because of the different way frequencies are stored.

However, before merging this PR, I'm calling @krisvanneste and @rcabdia to kindly check that the new code does not introduce any regression in your use cases, since you are both using the SourceSpec API.

Thanks! ☺️ Claudio

krisvanneste commented 3 months ago

Claudio,

I have an option to write the processed traces and the spectra (as "auxiliary" data) to an ASDF file (which is based on HDF5). This will need to be revised, but I should not otherwise be affected. I will test this tomorrow or beginning of next week.

Actually, it would be nice if obspy itself would support writing to HDF5, including the header information. None of the currently supported formats support preserving the metadata.

claudiodsf commented 3 months ago

I have an option to write the processed traces and the spectra (as "auxiliary" data) to an ASDF file (which is based on HDF5). This will need to be revised, but I should not otherwise be affected. I will test this tomorrow or beginning of next week.

Thanks Kris!

Actually, it would be nice if obspy itself would support writing to HDF5, including the header information. None of the currently supported formats support preserving the metadata.

ObsPy Stream().write() and Trace().write() have a PICKLE option which should preserve everything. But that comes with all the limitations of pickle, namely dependency on the library versions.

A more portable HDF5 format (lighter than ASDF and built-in) would be definitively nice. Maybe you can take some inspiration from this work and propose a PR to ObsPy 😉?

krisvanneste commented 3 months ago

After a clean switch to the new_spectrum_class branch, I run into this error:

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\ssp_build_spectra.py", line 740, in build_spectra
    weight_st = _build_weight_spectral_stream(config, spec_st, specnoise_st)

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\ssp_build_spectra.py", line 520, in _build_weight_spectral_stream
    weight = _build_weight_from_inv_frequency(spec_h)

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\ssp_build_spectra.py", line 443, in _build_weight_from_inv_frequency
    snr_fmin = getattr(spec.stats, 'spectral_snratio_fmin', None)

  File "C:\Users\kris\Documents\Python\cloned_repos\sourcespec\sourcespec\spectrum.py", line 54, in __getattr__
    return self[name]

KeyError: 'spectral_snratio_fmin'

This is very strange, as getattr should not throw an error if a default value (in this case None) is provided... No hurry, as my weekend starts in a few minutes!

claudiodsf commented 3 months ago

No hurry, as my weekend starts in a few minutes!

Will look into that on Tuesday!

Happy Easter! 🐣

krisvanneste commented 3 months ago

I investigated this problem a bit further. It happens for traces with no or too short noise window:

GR.GRA1..BH: No available noise window: a uniform weight will be applied

In that case, the spectrum header fields spectral_snratio_fmin and spectral_snratio_fmax are not set in the _check_spectral_sn_ratio function, and due to the way the __getattr__ method of AttribDict is implemented, getattr fails even if a default value is specified. The AttribDict in obspy does work with a default value.

The solution is to either set spectral_snratio_fmin and spectral_snratio_fmax also if there is no noise window, or perhaps revert to using obspy's AttribDict.

Happy Easter to you as well!

krisvanneste commented 3 months ago

A suggestion for writing to HDF5:

claudiodsf commented 3 months ago

I investigated this problem a bit further. It happens for traces with no or too short noise window:

GR.GRA1..BH: No available noise window: a uniform weight will be applied

In that case, the spectrum header fields spectral_snratio_fmin and spectral_snratio_fmax are not set in the _check_spectral_sn_ratio function, and due to the way the __getattr__ method of AttribDict is implemented, getattr fails even if a default value is specified. The AttribDict in obspy does work with a default value.

The solution is to either set spectral_snratio_fmin and spectral_snratio_fmax also if there is no noise window, or perhaps revert to using obspy's AttribDict.

Happy Easter to you as well!

Hello, this should be fixed with the latest commit.

I prefer to use my implementation of AttributeDict because I would like the Spectrum() class to be independent of ObsPy (except for the Spectrum().from_obspy_trace() method, which lazy-imports obspy.core.trace.Trace).

Maybe, in the future, I'll use this AttributeDict class all along the code 😉

claudiodsf commented 3 months ago

A suggestion for writing to HDF5:

  • maybe all spectra could be grouped in the group 'Spectra'

You mean something like:

HDF5_file
└──Spectra
    ├──spectrum_0000
    ├──spectrum_0001

?

What would be the advantage to have this top-level group? Would there be any other group than Spectra in the spectrum HDF5 file?

  • maybe name the spectra by nslc-code (network-station-location-channel) instead of just numbering them ?

The numbering is meant to reflect the order in the SpectrumStream object, which behaves like a list: Spectrum objects are accessed using an index. In your example, what should happen if two spectra have the same nslc-code?

claudiodsf commented 3 months ago

Force-pushed a better solution. Please verify again 😉 🙏

krisvanneste commented 3 months ago

Claudio,

The problem with the getattr call in the _check_spectral_sn_ratio function is solved now. However, there is another problem. Actually, the traces that triggered the first problem because there was no noise window, do have a noise window. When I run them with the main branch and a fixed window length, the trace plot looks like this: 651 traces 00 When I switch to the new_spectral_class branch without any modification in my script, the trace plot is very different: 651 traces 00 Apparently, something goes wrong with the window definitions.

krisvanneste commented 3 months ago

Concerning the suggestions for the HDF5 format:

These are just suggestions, based on my experience with ASDF files. It's of course your decision, but it could promote this HDF5 format as a more general format for seismic data...

claudiodsf commented 3 months ago

That's nasty! The time window has totally changed!

Let's try something first: I'll rebase this branch on main so that it will include the latest modificaitons on time window.

claudiodsf commented 3 months ago

Rebased and force-pushed.

Could you please force-pull and try again?

krisvanneste commented 3 months ago

The problem is gone after force-pushing. I still get empty noise windows for some stations, but that was also the case with the main branch, so nothing to do with the new spectrum class. I will investigate this later.

claudiodsf commented 3 months ago

These are just suggestions, based on my experience with ASDF files. It's of course your decision, but it could promote this HDF5 format as a more general format for seismic data...

I'm convinced! 👍

See the last commit, where I implemented your idea, with the only difference of using spectrum_NNNNN_NET.STA.LOC.CHAN for keys 😉

claudiodsf commented 3 months ago

@krisvanneste, ok for you to merge?

krisvanneste commented 3 months ago

Claudio, I still have a question. I ran my test case again with the option to save the spectra. I then read the saved spectra and compared them to those I collect in my wrapper function (and which come directly from the build_spectra function. Strangely, they are not the same... This is the spectrum for station GR.GRA1 written to the HDF5 file: Spectrum GR.GRA1..BHH | 225 samples, 0.23-5.00 Hz | 0.02 Hz sample interval | 68 samples logspaced, 0.23-5.12 Hz | 0.02 log10([Hz]) sample interval logspaced image This is the one I collect in my wrapper function: Spectrum GR.GRA1..BHH | 225 samples, 0.23-5.00 Hz | 0.02 Hz sample interval | 68 samples logspaced, 0.23-5.12 Hz | 0.02 log10([Hz]) sample interval logspaced image And this is what it looks like in the sourcespec plot: image I have the impression that the saved spectra have not been scaled yet. Is this the intention?

claudiodsf commented 3 months ago

Very strange!

Spectra are written after all the manipulations and after the inversion (see here).

Looking at your figure, I have the impression that you read the .residuals.hdf5 file (and not the .spectra.hdf5 file). Is that possible?

krisvanneste commented 3 months ago

Claudio, Yes, sorry, a question I forgot to ask was: why is the output file called .residuals.hdf5? That explains it then. However, I do not see a .spectra.hdf5 file in the output folder...

krisvanneste commented 3 months ago

OK, got it sorted now. I had to add a line to save the spectra in my wrapper function. I see the HDF5 file contains both the actual spectra and the fitted spectra.

claudiodsf commented 3 months ago

Sorry for late reply.

Yes, there is a new config option called save_spectra 😉

krisvanneste commented 3 months ago

Yes, but I also had to call the save_spectra function in my wrapper.

claudiodsf commented 3 months ago

So, good to merge? 🙃

krisvanneste commented 3 months ago

Yes, go ahead!

claudiodsf commented 3 months ago

Merged!!! 🚀

claudiodsf commented 3 months ago

Thanks Kris for the review and the suggestions 🙏