OverLordGoldDragon / ssqueezepy

Synchrosqueezing, wavelet transforms, and time-frequency analysis in Python
MIT License
618 stars 96 forks source link

Optionally output frequencies rather than scales (or provide a function to convert) #46

Closed abudden closed 2 years ago

abudden commented 2 years ago

In the Matlab cwt documentation, it gives this useful example (emphasis added by me):

[wt,f] = cwt(___,fs) specifies the sampling frequency, fs, in Hz as a positive scalar. cwt uses fs to determine the scale-to-frequency conversions and returns the frequencies f in Hz. If you do not specify a sampling frequency, cwt returns f in cycles per sample. If the input x is complex, the scale-to-frequency conversions apply to both pages of wt. If x is a timetable, you cannot specify fs. fs is determined from the RowTimes of the timetable.

In this example, if you provide the sampling frequency, then the frequencies are returned. It would be really useful if ssqueezepy could do this too (or at least provide a scale2frequency function to do the required conversion with the specified wavelet).

OverLordGoldDragon commented 2 years ago

Good request, it's on TODO. For now, let me know if this code works.

abudden commented 2 years ago

Good request, it's on TODO. For now, let me know if this code works.

Thanks for sharing that. Unfortunately, it's not working for me at the moment due to assertion errors. My code looks like this (scale_to_freq.py contains your shared code):

import ssqueezepy as ssp
from scale_to_freq import scale_to_freq

error_data = np.loadtxt('ErrorData.csv')
sample_spacing = 150e-6
wavelet = 'gmw'
Wx, scales = ssp.cwt(error_data, wavelet=wavelet, fs=1/sample_spacing)
# N=1024 chosen as it's the default in Wavelet class init function
frequencies = scale_to_freq(scales, wavelet, N=1024, fs=1/sample_spacing)

The error I get is:

ValueError: min `scales` (7.9e-01) exceeds bound (7.9e-01); wavelet here is ill-defined.

If I comment out the two checks on scales.min() and scales.max(), I get this error instead:

    assert freqs.min() >= 0,   freqs.min()

AssertionError: -0.49902343749999994

Do you have any idea what I might be able to do about this?

OverLordGoldDragon commented 2 years ago

Similar edge case; N should equal the length of the input, i.e. len(error_data). 1024 is fairly small and amplifies odds of discretization error (edge cases). If stuff still breaks with greater N, I'd try

scales[0] *= 1.01
scales[-1] /= 1.01

before passing to scale_to_freq to slightly nudge away from the min/max bounds.

abudden commented 2 years ago

Thanks! Changing N to len(error_data) seems to have fixed it and the resulting graph of frequencies vs np.mean(np.abs(Wx), axis=1) looks right with all the peaks where I'd expect them to be. Thanks again.

I also had to tweak the min and max as mentioned.

abudden commented 2 years ago

Actually, it worked with gmw (with the tweak), but not with morlet - the bound is further out:

ValueError: min `scales` (3.8564417362e+00) exceeds bound (3.8907017388e+00); wavelet here is ill-defined.

It looks like it's not just the first one that is outside the bounds

OverLordGoldDragon commented 2 years ago

Could you share code to reproduce the error? Data can be randn but must have same len

abudden commented 2 years ago

Error data is attached. This sample script produces the error:

#!/usr/bin/python
# vim: set fileencoding=utf-8 :

#%% Scientific Python Imports
import numpy as np     # analysis:ignore

import matplotlib.pyplot as plt
#%%
import ssqueezepy as ssp
from scale_to_freq import scale_to_freq

#%%

generate_data = False
if generate_data:
    error_data = np.random.randn(173043)
    np.save('error_data.npy', error_data)
else:
    error_data = np.load('error_data.npy')

sample_spacing = 150e-6

#%%
def plot_cwt(error_data, sample_spacing, wavelet='gmw', ax=None, linestyle='b-'):
    print(wavelet)
    if ax is None:
        fig, ax = plt.subplots()
    else:
        fig = None

    Wx, scales = ssp.cwt(error_data, wavelet=wavelet, fs=1/sample_spacing)
    means = np.mean(np.abs(Wx), axis=1)

    scales[0] *= 1.01
    scales[1] /= 1.01
    frequencies = scale_to_freq(scales, wavelet, N=len(error_data), fs=1/sample_spacing)

    wavelengths = 1.0/frequencies

    ax.plot(wavelengths, means, linestyle, label=wavelet)
    ax.set_yscale('log')
    ax.set_xscale('log')
    ax.set_xlabel('Wavelength (mm)')
    ax.set_ylabel('nm')
    ax.grid(True, which='both')
    return fig, ax

fig, ax = plot_cwt(error_data, sample_spacing, 'gmw')
plot_cwt(error_data, sample_spacing, 'morlet', ax=ax, linestyle='r-')
plot_cwt(error_data, sample_spacing, 'bump', ax=ax, linestyle='k-')

error_data.zip

OverLordGoldDragon commented 2 years ago

I see the problem; this version should work, but only with 'peak' mapping. Pushed out a commit to improve auto-scales for lowest freqs. 'bump' isn't quite uni-modal (see commit docstring); to be safe, inspect extrema wavelets manually:

psis = wavelet(scale=scales)
ssq.visuals.plot(psis[-5:].T, show=1)
ssq.visuals.plot(psis[:5].T,  show=1)

scales /= etc should no longer be needed.

abudden commented 2 years ago

That's brilliant, and working really well (I'm using Scale-to-freq 1.0). It would be great to see this integrated into the library at some point.

Out of interest I had a go with the synchrosqueezed versions as well to compare. For these it looks like producing the same plots is much simpler as frequency data is provided. What I found slightly odd though is that the returned frequencies (Frequencies associated with rows of Tx.) are in the opposite order to how I would expect so I have to use np.flip to get a meaningful plot:

        Tx, Wx, ssq_freqs, scales = ssp.ssq_cwt(error_data, wavelet='gmw', fs=1/sample_spacing)

        # Not sure why we need to flip this, but if we don't then it looks very wrong compared to the `cwt` plot!
        wavelengths = 1.0/np.flip(ssq_freqs)

        means = np.mean(np.abs(Tx), axis=1)
        ax.plot(wavelengths, means)

Is that intentional?

OverLordGoldDragon commented 2 years ago

It's something I never looked much into. It follows the original MATLAB toolbox - I suppose the idea's to sort lowest to highest, but this does not match the actual output indexing, which I think is more important. Might change in a future version.

OverLordGoldDragon commented 2 years ago

I take this Issue's resolved, feel free to open another.