cvxgrp / signal-decomposition

A simple and general framework for signal decomposition
BSD 3-Clause "New" or "Revised" License
60 stars 7 forks source link

very nonlinear example - review #18

Closed andrewcz closed 4 months ago

andrewcz commented 5 months ago

Hi @bmeyers

I tried the package on a different signal, are you able to have a look at it and see if i am making sense , the code runs but i am not sure the decomposition is correct?? -

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import signal
from time import time
import seaborn as sns
import cvxpy as cvx
sns.set_style('darkgrid')
import sys
#sys.path.append('/Users/bennetmeyers/Documents/Boyd-work/optimal-signal-demixing/')
from gfosd import Problem
from gfosd.components import *

np.random.seed(42)

def generate_complex_signal(length, noise_level, jump_chance, decay_rate, mean_change, variance_change):
    signal = np.zeros(length)
    mean = 0
    variance = 1
    for i in range(1, length):
        if np.random.rand() < jump_chance:
            signal[i] = signal[i-1] + np.random.normal(mean, variance)
            mean += mean_change * np.random.randn()
            variance = max(variance + variance_change * np.random.randn(), 0.1)
        else:
            signal[i] = signal[i-1] * (1 - decay_rate) + np.random.normal(mean, variance)
        signal[i] += noise_level * np.random.randn()
    return signal

def plot_signal(signal):
    plt.figure(figsize=(15, 5))
    plt.plot(signal)
    plt.title('Complex Signal with Random Characteristics')
    plt.xlabel('Time')
    plt.ylabel('Signal Value')
    plt.show()

def main():
    length = 3000  # Number of points in the signal
    noise_level = 30.0  # Further increased noise level for more non-stationarity
    jump_chance = 0.3  # Chance of regime changes for complexity
    decay_rate = 0.1  # Decay rate for the autoregressive component
    mean_change = 5.0  # Increased mean change for more movement
    variance_change = 0.5  # Increased variance change for more movement

    complex_signal = generate_complex_signal(length, noise_level, jump_chance, decay_rate, mean_change, variance_change)
    plot_signal(complex_signal)

    t = np.arange(length)
    X_real = np.zeros((3, len(t)), dtype=float)
    X_real[0] = 0.15 * np.random.randn(len(complex_signal))
    X_real[1] = complex_signal
    X_real[2] = signal.square(2 * np.pi * t * 1 / (450.))
    y = np.sum(X_real, axis=0)

    K, T = X_real.shape

    plt.figure(figsize=(10, 6))
    plt.plot(t, np.sum(X_real[1:], axis=0), label='true signal minus noise')
    plt.plot(t, y, alpha=0.5, label='observed signal')
    plt.legend()
    plt.show()

    c1 = SumSquare(weight=1/len(y))
    c2 = SumSquare(weight=1, diff=2)
    c3 = FiniteSet(values={-1, 1})
    components = [c1, c2, c3]
    problem = Problem(y, components)

    problem.decompose(verbose=True)
    print('\nobjective value: {:.4e}'.format(problem.objective_value))

    problem.plot_decomposition(X_real=X_real)
    plt.show()

if __name__ == "__main__":
    main()
andrewcz commented 5 months ago

hi @bmeyers i hope your well. are you able to take a look at the above to see if its makes sense.

bmeyers commented 5 months ago

Hi @andrewcz, I took at look at the signal you were analyzing:

user-signal

And I can confirm that your intuition is correct. The SD model you implemented in the SD modeling language—a smooth component, a {-1, 1} switching component, and a Gaussian noise component—is not a good representation of your data. Please do take a look at §2.7 of the signal decomposition monograph for a high-level description of how the user is meant to guide the analysis:

The methods described in this monograph are typically applied in situations where the analyst has a strong prior belief about what they want from a decomposition, often drawn from domain expertise. The analyst has a rough sense of the number of components they are looking for and the general characteristics of those components, which inform the selection of K, ϕk, and θk.

andrewcz commented 4 months ago

Thank you @bmeyers much appreciated. I was wondering, for a non expert would there be heuristics in which one could estimate the appropriate parameters. Cheers, andrew

bmeyers commented 4 months ago

@andrewcz, yes we do provide some guidance here (again, see §2.7), but only in the form of a framework for choosing between multiple candidate models, i.e., component classes or model weights. In general, you still need to start by being able to answer the question, "What kind of processes or effects do I expect to be in the data?" So for your example, what do you want to uncover? Auto-regressive structure? Shock events? Something else? Defining that is the first step to selecting the classes and weights for a candidate model. It's often helpful to think in terms of components that are small or sparse in a certain basis or transform. After creating an initial prototype model, you can use holdout validation to tune model weights or compare to other component representations. You might find our introductory tutorial helpful: http://signal-decomp-tutorial.org/