abudesai / timeVAE

TimeVAE implementation in keras/tensorflow
MIT License
93 stars 22 forks source link

Data Similarity #18

Closed blht-chen closed 1 month ago

blht-chen commented 1 month ago

I directly ran the vad_pipeline.py script in your code repository, but the T-SNE image I got doesn't seem to be very ideal. I want to know if some operations were missed. QQ截图20240521093353

abudesai commented 1 month ago

There is nothing wrong with those results. In fact, those are good results. The distribution of the red points (original data) and blue points (synthetic data) are almost identical (i.e. they overlap heavily). The only reason this chart may not "look" as good as some other t-sne charts you may have seen elsewhere is that the sample size is small in this chart, so it looks sparse. This particular dataset is "sine - 2 percent" which is referring to the case where we used only 2% of the original dataset in the experiment. One of the claims in the TimeVAE paper is that it doesn't need as much data as some of the other state-of-the-art models to learn to produce good synthetic samples. That's why we tested with different dataset sizes - 2%, 5%, 10%, 20% and the full (100%) scenarios.

blht-chen commented 1 month ago

May I ask what specific impact these three parameters have on the quality of the generated data for timeVae's hyperparameters: trend_poly, custom_seas, and use-residual_conn? I noticed that in your new code, trend_poly defaults to 0 and custom_seas is null

abudesai commented 1 month ago

Closing this issue.