basics-lab / qsft

Efficiently computing Fourier transforms
https://arxiv.org/abs/2301.06200
3 stars 0 forks source link
compressed-sensing computational-biology fourier fourier-methods fourier-transform

Efficient Sparse q-ary Fourier Transforms

This repository contains code for the paper, to appear at IEEE ISIT 2023:

"Efficiently Computing Sparse Fourier Transforms of $q$-ary Functions" Yigit Erginbas, Justin Kang, Amirali Aghazadeh, Kannan Ramchandran

*Equal contribution: These authors contributed equally.

Check out our NEW Youtube video HERE

This package may be useful to you if you deal with complicated functions of $q$-ary sequences, for example, functions of protiens, DNA or RNA.

Table of Contents

Abstract

Fourier transformations of pseudo-Boolean functions are popular tools for analyzing functions of binary sequences. Real-world functions often have structures that manifest in a sparse Fourier transform, and previous works have shown that under the assumption of sparsity the transform can be computed efficiently. But what if we want to compute the Fourier transform of functions defined over a $q$-ary alphabet? These types of functions arise naturally in many areas including biology. A typical workaround is to encode the $q$-ary sequence in binary however, this approach is computationally inefficient and fundamentally incompatible with the existing sparse Fourier transform techniques. Herein, we develop a sparse Fourier transform algorithm specifically for $q$-ary functions of length $n$ sequences, dubbed $q$-SFT, which provably computes an $S$-sparse transform with vanishing error as $q^n$ goes to $\infty$ in $O(Sn)$ function evaluations and $O(S n^2 \log q)$ computations, where $S = q^{n\delta}$ for some $\delta < 1$. Under certain assumptions, we show that for fixed $q$, a robust version of $q$-SFT has a sample complexity of $O(Sn^2)$ and a computational complexity of $O(Sn^3)$ with the same asymptotic guarantees. We present numerical simulations on synthetic and real-world RNA data, demonstrating the scalability of $q$-SFT to massively high dimensional $q$-ary functions.

Quick Start

The main functionality of our algorithm is available in the QSFT class. A minimal example can be found in synt_exp/quick_example.py. Details on how this file works can be found in other sections of the README.

Signals

In this section, we discuss the Signal objects that we use to interface with the QSFT class. A Signal encapsulates the object that we are trying to transform (you may interpret it as a signal of length $q^n$ or a function of $n$ $q$-ary variables). Most relevant to our discussion is the SubsampledSignal class found at qsft.input_signal_subsampled.SubsampledSignal. This class can be extended to easily create a signal for the specific application that we desire. For example, we create a synthetic signal that is sparse in the Fourier domain in synt_exp.synt_src.synthetic_signal.SyntheticSparseSignal. The subsample() function must be implemented in the extended class. This function takes a list of query_indicies and outputs a list of fuction/signal value at the given query indicies. We refer to the SyntheticSparseSignal as an example.

We can construct a SyntheticSparseSignal as follows. First, we need to declare the query_args:

    query_args = {
        "subsampling_method": "qsft",
        "query_method": "complex",
        "num_subsample": num_subsample,
        "b": b,
        "delays_method_source": "identity",
        "delays_method_channel": "nso",
        "num_repeat": num_repeat,
    }

Let's break this down.

With query_args set, we can now construct our signal object. To do so, we call the get_random_subsampled_signal, which randomly generates a SyntheticSubsampledSingal for us.

test_signal = get_random_subsampled_signal( n=n,
                                            q=q,
                                            sparsity=sparsity,
                                            a_min=a_min,
                                            a_max=a_max,
                                            noise_sd=noise_sd,
                                            query_args=query_args,
                                            max_weight=t)

Some parameters are explained below:

Now that we have a signal object, the next step is to take its transform!

QSFT

Once we construct the signal we want to transform, the next step is to create the QSFT object that will perform the transformation. Again, we start with the key arguments for

    qsft_args = {
        "num_subsample": num_subsample,
        "num_repeat": num_repeat,
        "reconstruct_method_source": delays_method_source,
        "reconstruct_method_channel": delays_method_channel,
        "b": b,
        "noise_sd": noise_sd,
        "source_decoder": decoder
    }

The TestHelper is an abstract class used to encapsulate the complete pipeline of sampling, data storage, data loading and sparse Fourier transformation. It contains a single abstract method generate_signal that needs to be overriden when inheriting TestHelper.

The only argument of the generate_signal method is the dictionary signal_args that is provided to the helper object at object creation. The generate_signal method needs to be implemented such that for a given signal_args dictionary, it returns the corresponding Signal object.

For instance, the SynthethicHelper class inherits TestHelper and overrides the generate_signal method as follows.

from qsft.test_helper import TestHelper
from synt_exp.synt_src.synthetic_signal import SyntheticSubsampledSignal

class SyntheticHelper(TestHelper):
    def generate_signal(self, signal_args):
        return SyntheticSubsampledSignal(**signal_args)

Then a SyntheticHelper object needs be created with following arguments:

TestHelper(signal_args,
           methods, 
           subsampling_args,
           test_args,
           exp_dir)

Here, the arguments are as follows:

For instance, the following code creates a SyntheticHelper object

methods = ["qsft"]
subsampling_args = {
            "num_subsample": 5,
            "num_repeat": 3,
            "b": 7,
        }
test_args = { "n_samples": 200000 }
helper = SyntheticHelper(signal_args, methods, subsampling_args, test_args, exp_dir)

At the time of object creation, the signal object is generated and subsampled. To compute the model using samples, we call compute_model method with arguments

For instance, we can run

method = "qsft"
model_kwargs = {
            "num_subsample": 2,
            "num_repeat": 2,
            "b": 7,
            "noise_sd": 0.01
}
helper.compute_model(method, model_kwargs)

Experimental Results

Comparing with LASSO

In addition to implementing QSFT, we also include a comparison with LASSO implemented via group-lasso, which is significantly slower for this application.

The following figures compare LASSO and QSFT. These figures were generated by using the scripts at synt_exp/run-tests-complexity-vs-size.py and plotted by synt_exp/plot-complexity-vs-size.py. The grey area in the first graph is a region where LASSO took too long to converge.

LASSO vs. QSFT

As we can see, the runtime of $q$-SFT is sub-exponential in $n$, making it practical where LASSO is not.

SNR vs NMSE

As the amount of noise in the signal/function increases, sucessful recover becomes more difficult. To examine this phenomonon, the script synt_exp/run-tests-nmse-vs-snr.py is useful. In graph below, we see that for different sparsity levels $S QSFT goes from a very high to low NMSE at some threshold. This type of phase transtion behaviour is tpyical in compressed sensing.

SNR vs NMSE

Real-World Example from Computational Biology

This repository also provide an example of how to apply our code to a complex $q$-ary function in
ViennaRNA. Code for this example is in the rna_exp folder. We create the RnaSubsampledSignal(SubsampledSignal) Class. The subsample(self, query_indices) function interfaces with the ViennaRNA package, to compute the Mean Free Energy (MFE) of an RNA sequence.

Example: Computational Biology

The graph above shows that when $n$ is large, our the QSFT function achieves a low NMSE. This means that QSFT generates a sparse fourier transform that is able to compute the MFE of an arbitrary unseen RNA sequence with relatively little error.