Problem in _smooth_spectrum function

krisvanneste commented 2 years ago

I think there is a problem in the _smooth_spectrum function. When the spectrum is converted from linear space to log space, the spectrum is first decimated to 50 points (the default number of values generated by np.logspace) before the actual smoothing is applied.

To avoid loosing nformation, it would be much better to generate the freq_log sequence with the smallest log_df spacing in the linear series, as follows: log_df = np.diff(np.log10(freq[-2:]))[0] freq_log = 10**(np.arange(np.log10(freq[0]), np.log10(freq[-1]) + log_df, log_df))

When you do that, however, the 'npts' argument doesn't make much sense anymore. It would be better to define the amount of smoothing in terms of bandwidth in log space. The number of points can then be computed from the bandwidth as follows: npts = max(1, int(round(bandwidth / log_df)))

https://github.com/SeismicSource/sourcespec/blob/3ad78e9b3e00c64676fefb676cfec7e5f259dcc1/sourcespec/ssp_build_spectra.py#L216

claudiodsf commented 2 years ago

Hi Kris and thanks for the feedback!

What you propose is very interesting.

I'm going to create a branch so that we can test this idea.

krisvanneste commented 2 years ago

Claudio, Thanks for looking into this!

Kris

claudiodsf commented 2 years ago

I'm trying right now with this:

def _smooth_spectrum(spec):
    """Smooth spectrum in a log_freq space."""
    freq = spec.get_freq()
    f = interp1d(freq, spec.data, fill_value='extrapolate')
    _freq_log = np.log10(freq)
    # minimum frequency interval in log10
    log_df = _freq_log[-1] - _freq_log[-2]
    # frequencies in logarithmic spacing
    freq_log = 10**(np.arange(_freq_log[0], _freq_log[-1]+log_df, log_df))
    spec.freq_log = freq_log
    spec.data_log = f(freq_log)
    # number of points for smoothing
    log_bandwidth = _freq_log[-1] - _freq_log[0]
    npts = max(1, int(round(log_bandwidth / log_df)))
    spec.data_log = smooth(spec.data_log, window_len=npts)
    f = interp1d(freq_log, spec.data_log, fill_value='extrapolate')
    spec.data = f(freq)

But npts is of the same order of len(freq_log), which makes the spectrum way over smoothed...

Maybe I misunderstood what you mean for "bandwidth in log space" (log_bandwidth here)?

krisvanneste commented 2 years ago

Claudio,

log_bandwidth should be an argument of the _smooth_spectrum function (replacing npts). I use a default value of 0.05, which seems to give good results, but it should be configurable by the end user.

Kris

claudiodsf commented 2 years ago

I just pushed the branch https://github.com/SeismicSource/sourcespec/tree/smooth_spectrum

The main difference with your code is that I set the number of points in logspace to be equal to the points of the original spectrum: I found that your approach was generating way too many spectral points, which has a heavy cost for grid search inversion.

Otherwise, I think that introducing this new parameter (which I named spectral_smooth_width_decades) is a clearer approach. 😉

Could you please test this branch and tell me your opinion?

krisvanneste commented 2 years ago

Claudio,

Thanks for updating your code. I can't run sourcespec to evaluate the impact on the magnitude estimation right now, but I will try to evaluate the spectral smoothing by tomorrow, if that's OK.

Kris

claudiodsf commented 2 years ago

No worry! Take your time!

claudiodsf commented 2 years ago

Just pushed a new commit https://github.com/SeismicSource/sourcespec/commit/16949b12a0887adce532fba37df5c8ea204fd8d3

Here I resample the smoothed log-spectrum using a sampling rate that is 1/5 of the smooth window size.

In my tests, it is a sufficient number of points to correctly represent the smoothed spectrum.

Let me know what do you think.

krisvanneste commented 2 years ago

Claudio,

I compared your new smoothing function with mine and with your previous one in the attached plot.

With the same bandwidth (smooth_width_decades), the result of the new smoothing function is very close to mine (labeled "robspy" in the plot). With the default value of 0.2, it is close to your original function. Anyway, the new function can be configured much better than the original one, which smoothed a great deal more than you would expect with a smoothing window (npts) of only 5 points.

The only potential problem I see is that in your spec object, the linear spectrum (spec.freq, spec.data) has a different resolution compared to the logarithmic spectrum (spec.freq_log, spd.data_log). I hope this doesn't cause any problems later on in the process.

Kris spectral_smoothing_comparison

claudiodsf commented 2 years ago

Hi Kris and thanks for this test!

It looks like I prefer smoothing more than you 😉. So it's good to have a parameter for that!

The linear spectrum is not used anymore, except for plotting. But I think I'll replace spec.data with spec.data_log also for plotting, so that one can visually judge the effect of the downsampling.

I'll do this latest modification later today and then merge the smooth_spectrum branch.

Thanks again!

krisvanneste commented 2 years ago

Claudio,

OK, thanks for addressing this! And yes, maybe I need to smooth a little more...

Kris

SeismicSource / sourcespec

Problem in _smooth_spectrum function #2