abudesai / timeVAE

TimeVAE implementation in keras/tensorflow
MIT License
93 stars 22 forks source link

Is the loss changes as expected? #8

Closed shy19960518 closed 1 month ago

shy19960518 commented 2 months ago

Dear Abudesai, I am trying to use this as my baseline, however, the kl_loss starts from 0 and converges to around 280, the loss starts at 10k, and converges to 200, and recon_loss converges to around 300. This happens both when using your test data and my own data. For the case using my data, I got a model with (FID=280), which is not successfully learned well.

Sorry, I do not use Tensorflow/ as well as I do not have enough time to debug, is this case normal? Or what is the wrong I made?

abudesai commented 2 months ago

How are you running the model? Are you using the test_vae.py script for it? Are you scaling the data before feeding into the model?

I updated the repo just a couple days ago and it was working fine then. Best, Abu

On Thu, Apr 18, 2024 at 9:43 AM shy19960518 @.***> wrote:

Dear Abudesai, I am trying to use this as my baseline, however, the kl_loss starts from 0 and converges to around 280, the loss starts at 10k, and converges to 200, and recon_loss converges to around 300. This happens both when using your test data and my own data. For the case using my data, I got a model with (FID=280), which is not successfully learned well.

Sorry, I do not use Tensorflow/ as well as I do not have enough time to debug, is this case normal? Or what is the wrong I made?

— Reply to this email directly, view it on GitHub https://github.com/abudesai/timeVAE/issues/8, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXIAM2PO5QSQFFUBHXCCX3Y57Z3RAVCNFSM6AAAAABGNURRL6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI2TCMJTGYYDSMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

shy19960518 commented 2 months ago

Hi, thank you so much for you fast reply. I delete the scaling code cause I had mapped the data to [-1.1], and I changed the input data, and make sure the shape and dtype matched. The only thing i modified on net work is just delete the "callback " when defined the model, cause I dont need early stop and lr-decay. I did all the modification on the "test_vae.py". Tomorrow I will present the details what I met, thank you again for the kind reply. Hope we could solve this.

uliuycy commented 2 months ago

Hi, do you know how to set N, T, and D if you want to use stock data

shy19960518 commented 2 months ago

@rt-adesai Hi abudesai, Screenshot from 2024-04-20 12-10-24 Here this is my direcyly run test_vae.py without any change. I got same unusual loss when using my own data. I have valid that generated data with this loss is not good with this loss(so that the problems is not just reflect on the shown). May I please see what it runs in your local PC? Many thanks for your patient.

shy19960518 commented 2 months ago

Hi, do you know how to set N, T, and D if you want to use stock data

N, T, D 固定地等于data.shape, 它与你定义的问题挂钩,而不是可调整的超参数。 要改变它们你可以替换不同的数据

abudesai commented 2 months ago

Can you share your data, if possible?On Apr 20, 2024, at 4:15 AM, shy19960518 @.***> wrote: @rt-adesai Hi abudesai, Screenshot.from.2024-04-20.12-10-24.png (view on web) Here this is my direcyly run test_vae.py without any change. I got same unusual loss when using my own data. I have valid that generated data with this loss is not good with this loss(so that the problems is not just reflect on the shown). May I please see what it runs in your local PC? Many thanks for your patient.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

shy19960518 commented 2 months ago

Can you share your data, if possible?On Apr 20, 2024, at 4:15 AM, shy19960518 @.> wrote: @rt-adesai Hi abudesai, Screenshot.from.2024-04-20.12-10-24.png (view on web) Here this is my direcyly run test_vae.py without any change. I got same unusual loss when using my own data. I have valid that generated data with this loss is not good with this loss(so that the problems is not just reflect on the shown). May I please see what it runs in your local PC? Many thanks for your patient. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>

y, here is the linked data: https://drive.google.com/file/d/1Mm-HwZ4uWNsH9RXMuOi7YnziNP-g5rX8/view?usp=drive_link.

I tested the code and found that these are two issues. When using your test data, the loss still looks strange, but the generated results seem acceptable. I'm guessing that the reason for the poor generation of my own data is that the scaling is not aligned. I'm currently debugging this issue. Regarding the abnormal problem of loss, if it is in pytorch, I have encountered a similar case in previous experiments. The problem at that time was that the cross entropy was not aligned with the category/onehot encoding, which caused the loss to be superimposed on the length, resulting in a loss convergence point proportional to the length. But I don’t know how to use tensorflow, so I can’t debug this problem. Is your running result similar to mine (loss display is abnormal)?

abudesai commented 2 months ago

What is the original shape of the data you have? How long is the stocks data you have? How many different stocks? How many features to the data? On Apr 20, 2024, at 4:24 AM, shy19960518 @.***> wrote:

Hi, do you know how to set N, T, and D if you want to use stock data

N, T, D 固定地等于data.shape, 它与你定义的问题挂钩,而不是可调整的超参数。 要改变它们你可以替换不同的数据

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

shy19960518 commented 2 months ago

What is the original shape of the data you have? How long is the stocks data you have? How many different stocks? How many features to the data? On Apr 20, 2024, at 4:24 AM, shy19960518 @.> wrote: Hi, do you know how to set N, T, and D if you want to use stock data N, T, D 固定地等于data.shape, 它与你定义的问题挂钩,而不是可调整的超参数。 要改变它们你可以替换不同的数据 —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>

I guess you are not asking me ^ ^

abudesai commented 2 months ago

A number of comments: It looks like you are using learning rate = 5.0 - that might be too high. Have you checked with lower learning rates? We use 1e-3 by default. It will take longer to train and converge, but results will be better.

You asked about how to set N, T, D - those are driven by your data. Your data shape is [76228, 120, 1] which is [N, T, D]. In your case, you seem to have a single dimension. Can you confirm that's really how it is for your dataset? Typically, with stocks, you would have multiple dimensions - open price, close price, day high, day low, volume, etc.

Regarding the losses, you mentioned that your KL loss started at 0. I am not getting the same when running on your data. The loss progression looks as follows when I run the model as-is on your data. image

The loss values being in hundreds doesn't imply anything is wrong. We didnt scale the loss by number of batches so the values are high as the dataset size is bigger. How are you evaluating the quality of generated samples?

shy19960518 commented 2 months ago

Hi abudesai, Thank you for you concern. I have simply test the generated data, and now it works. The only problem is the display of loss, it converges to a high value. Maybe you forget to set loss = 0 after (per avg calculation)/(per print). Thank you for your reply again. Without your concern, I may give up the debug.
For the performance, it may depend on the model design space. I will try to reproduce the best results

shy19960518 commented 2 months ago

A number of comments: It looks like you are using learning rate = 5.0 - that might be too high. Have you checked with lower learning rates? We use 1e-3 by default. It will take longer to train and converge, but results will be better.

You asked about how to set N, T, D - those are driven by your data. Your data shape is [76228, 120, 1] which is [N, T, D]. In your case, you seem to have a single dimension. Can you confirm that's really how it is for your dataset? Typically, with stocks, you would have multiple dimensions - open price, close price, day high, day low, volume, etc.

Regarding the losses, you mentioned that your KL loss started at 0. I am not getting the same when running on your data. The loss progression looks as follows when I run the model as-is on your data. image

The loss values being in hundreds doesn't imply anything is wrong. We didnt scale the loss by number of batches so the values are high as the dataset size is bigger. How are you evaluating the quality of generated samples?

Wow we reply at the same time. yeah, I am sure what my dataset should be like. Now it works. Thank you again for your test. In my figure, the learning rate is about 5e-4, not 5. Now I am aware that the loss shown here is correct.

abudesai commented 2 months ago

oh, you are right - your learning rate is 5e-4. Happy to help if you have any other questions/concerns.

shy19960518 commented 2 months ago

May I ask if there could lead to any performance unstable if I set code like this:

vae.compile(optimizer=Adam(learning_rate=1e-5))
for i in range(10):
    vae.fit(
        train_data,
        batch_size=32,
        epochs=140,
        shuffle=True,
        # callbacks=[reduceLR],
        verbose=1,
    )
    #something like save check_point and evaluate model. 
    ...

I will pick check_point manually so I don't want LR_reduce. Can model continue learning without any potential error?

abudesai commented 2 months ago

You can test and see how it works. It shouldn't cause any problems. The model may over-fit but you can use the checkpoints to get the best model. For us, early stopping and LR reduction worked good enough.

Just test your way and if it works, it works.

abudesai commented 2 months ago

Btw, tensorflow has checkpoint mechanism, so you dont need to do the loop.

shy19960518 commented 2 months ago

You can test and see how it works. It shouldn't cause any problems. The model may over-fit but you can use the checkpoints to get the best model. For us, early stopping and LR reduction worked good enough.

Just test your way and if it works, it works.

OK, at least I didn't make any obviously stupid operation.I will check if it will work.

shy19960518 commented 2 months ago

Btw, tensorflow has checkpoint mechanism, so you dont need to do the loop.

OK, I will try to get familiar with tf in the future.