Question about method execution

alexkaravos commented 1 month ago

I'm looking for clarification so that I can benchmark against your method on my own data.

Here's my understanding:

Presuming we have pre-trained VAE (any other unsupervised non-contrastive embedding works too I assume?). In each batch we randomly select pairs for mix-up. We select a distance threshold $\epsilon$ such that above $\epsilon$ we sample $\lambda$'s from the truncated normal distribution, else if the samples are sufficiently close we sample from the uniform distribution that allows for greater mixup. We perform the mixup between x1 and x2 to create (x1,x1') and (x2,x2') which get used for simclr frameworks as augmentation pairs.

Questions:

In the appendix parameters, is $\epsilon$ the distance threshold that in the preposition is referred to as $\delta$?
In the appendix parameters, is the third parameter of the normal distribution $\mathcal{N}^t$ the upperbound or lower bound of the truncation? I'm presuming lowerbound and its also upperbounded by 1?
How is $\epsilon$ chosen?
Is $\epsilon$ a euclidean distance or a cosine similarity? The cosine similarity is mentioned but the VAE code and paper say its trained with β-TCVAE which is a modified ELBO that normally creates a euclidean space.

Tangential Question:

I'm working on a method for representation learning of time series events. The events themselves are anomalies (not quasi-periodic data like this method works with). I'm only concerned with a set of centred (around a peak) signals that are anomalies for a downstream task of classification (multi-class). If you know of other notable methods I can benchmark against It'd be greatly appreciated.

Thanks for doing great work.

Clips from the paper for reference:

The variable settings in from the appendix are:

Berken-demirel commented 1 month ago

Hi @alexkaravos,

Yes your understanding is correct, also after mix-up you can apply other augmentations. Regarding your questions,

Yes, $\epsilon$ is the distance threshold.
Yes, you can also see that from the code implementation. https://github.com/eth-siplab/Finding_Order_in_Chaos/blob/141b2b9f51543a267a86312b19238936853be0c5/new_augmentations.py#L206-L215
The hyperparameter selection is explained in Appendix Section D.1. Parameters for mixing. We select parameters for each task and I can say that they are quite robust across datasets and even tasks (the $\epsilon$ parameter for the activity recognition and CVD is the same).
We use the distance metric same as the during the training of the encoder $E(.)$.

If the data is not quasi-periodic, you can try some recent augmentation techniques, which have shown to work well too.

alexkaravos commented 1 month ago

Thanks for the replies and clarification.

Was there any motivation for the truncated normal since the distribution is roughly almost just linearly sloped between 0.9,1?

Berken-demirel commented 1 month ago

Hi @alexkaravos, sorry for the delayed response to your motivation question.

We aim for the augmented sample to closely resemble the original. Using a uniform distribution between 0.9 and 1.0 led to a higher probability of coefficients between approximately 0.9 and 0.94, resulting in lower performance. Typically, a Beta distribution is used for mixup, but we've found that in the absence of label information (as in unsupervised cases), the mixup degree needs to be much higher. This is likely due to the necessity of preserving sample identity while generating new samples.

alexkaravos commented 1 month ago

Thanks,

Is there a short form name I can use to refer to your phase-amplitude-sim mixup method? I believe the studies in the paper are done using gen_new_aug_2 if theres a nicer name to the function let me know.

Berken-demirel commented 1 month ago

No worries! If you prefer a shorter name, you can refer to the method as "Quasi-periodic Mixup (QM)" since its primary motivation is for quasi-periodic time series. I hope this helps!

eth-siplab / Finding_Order_in_Chaos

Question about method execution #5