Train loss is decreasing

Negar-Erfanian / Neural-spatio-temporal-probabilistic-transformers

1 stars 0 forks source link

Train loss is decreasing #1

Open maizeng2008 opened 1 year ago

maizeng2008 commented 1 year ago

Hi, I am very interested in your work and currently I am running the code. I am confused by the loss curve, from my results, the loss is decreasing where it should be increasing. I checked the code and saw some code are commented, is this the final version of your code that can reproduce the results in the paper?

I tried Covid19 and pinwheel dataset but currently now I cannot reproduce the result.

Would you please provide me the hyper-parameters?

Thank you.

Negar-Erfanian commented 1 year ago

Hello! Glad that you are interested and I would be happy to help. The code is not fully commented, I will try to finish commenting by this weekend.

Loss is actually negative of log likelihood that we are minimizing in this work : https://arxiv.org/abs/2211.02922v2

So it should be decreasing (by minimizing the negative log likelihood we are maximizing the log likelihood)

Are you using the transformer architecture? All the hyper-parameters are mentioned in the paper in appendix section F.

I will put a complete "readme" to this code so it will walk you through all the steps and datasets to make it easier to run/use the code.

maizeng2008 commented 1 year ago

Currently, Im using your code to reproduce the result of Pinwheel, this is my result with 5 epochs, transformer as the model.

I think I follow the paper's Hyperparameter well.

But my result is pretty making no sense, would you indicate me which part I might done wrong?

Negar-Erfanian commented 1 year ago

Hi!

I am not sure what you are printing as the loss. Are you using the latest version of the code that has been uploaded?

Also, all datasets used in this work are pretty much stochastic. you might see some fluctuations in the loss in the beginning until it finally starts converging (which might also get enhanced by doing some hyperparam tunning such as the seed or the learning rate etc.)

5 epochs will not suddenly shoot you to the converging point of your loss, that's why for these datasets you need more and more epochs. Here are the results for 20 epochs on my end:

maizeng2008 commented 1 year ago

Hi, Thank you for the reply,would you please share with me the configuration file that you used to train on these 4 datasets?

If you can, that would be much appriciated!

Negar-Erfanian commented 1 year ago

Hi,

This is the exact same code provided here. To run the train and test you should use these commands in the terminal:

train: python3 train2.py test: python3 train2.py --notrain

you can change the sequence length, number of output events, number of epochs etc etc via argparsing provided in utils.py

I hope I understood correctly what you mean by the configuration file. if not please make it more clear and I can give you more specific details.