eth-siplab / Finding_Order_in_Chaos

The repository provides code implementations and illustrative examples of NeurIPS 2023 paper, Finding Order in Chaos: A Novel Data Augmentation Method for Time Series in Contrastive Learning.
27 stars 1 forks source link

Question about method execution #5

Closed alexkaravos closed 1 month ago

alexkaravos commented 1 month ago

I'm looking for clarification so that I can benchmark against your method on my own data.

Here's my understanding:

Presuming we have pre-trained VAE (any other unsupervised non-contrastive embedding works too I assume?). In each batch we randomly select pairs for mix-up. We select a distance threshold $\epsilon$ such that above $\epsilon$ we sample $\lambda$'s from the truncated normal distribution, else if the samples are sufficiently close we sample from the uniform distribution that allows for greater mixup. We perform the mixup between x1 and x2 to create (x1,x1') and (x2,x2') which get used for simclr frameworks as augmentation pairs.

Questions:

Tangential Question:

I'm working on a method for representation learning of time series events. The events themselves are anomalies (not quasi-periodic data like this method works with). I'm only concerned with a set of centred (around a peak) signals that are anomalies for a downstream task of classification (multi-class). If you know of other notable methods I can benchmark against It'd be greatly appreciated.

Thanks for doing great work.

Clips from the paper for reference:

image The variable settings in from the appendix are:

image

Berken-demirel commented 1 month ago

Hi @alexkaravos,

Yes your understanding is correct, also after mix-up you can apply other augmentations. Regarding your questions,

If the data is not quasi-periodic, you can try some recent augmentation techniques, which have shown to work well too.

alexkaravos commented 1 month ago

Thanks for the replies and clarification.

Was there any motivation for the truncated normal since the distribution is roughly almost just linearly sloped between 0.9,1?

image

Berken-demirel commented 1 month ago

Hi @alexkaravos, sorry for the delayed response to your motivation question.

We aim for the augmented sample to closely resemble the original. Using a uniform distribution between 0.9 and 1.0 led to a higher probability of coefficients between approximately 0.9 and 0.94, resulting in lower performance. Typically, a Beta distribution is used for mixup, but we've found that in the absence of label information (as in unsupervised cases), the mixup degree needs to be much higher. This is likely due to the necessity of preserving sample identity while generating new samples.

alexkaravos commented 1 month ago

Thanks,

Is there a short form name I can use to refer to your phase-amplitude-sim mixup method? I believe the studies in the paper are done using gen_new_aug_2 if theres a nicer name to the function let me know.

Berken-demirel commented 1 month ago

No worries! If you prefer a shorter name, you can refer to the method as "Quasi-periodic Mixup (QM)" since its primary motivation is for quasi-periodic time series. I hope this helps!