dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.61k stars 485 forks source link

Model is not learning today, despite yesterday's same code worked :/ You guys know the reason? #437

Closed SquareGraph closed 2 years ago

SquareGraph commented 2 years ago

Heey, very weird thing happened. I ran the following code yesterday, and even published a notebook here on Git, after finished the project. https://github.com/SquareGraph/FootballPredictionsModel/blob/main/BaselineModels_Football_Predictions_55_60_version_to_publish.ipynb

Zrzut ekranu 2022-09-15 o 10 34 25

And it worked perfectly.

But today, when I ran the exact same code, there is a loss at the start equal to 0.0 and the model is not learning anything.

Zrzut ekranu 2022-09-15 o 10 33 40

Any ideas what is the cause?

If the current behavior is a bug, please provide the steps to reproduce.

  1. Steps to reproduce - I ran the same code again today.
  2. You can use the link to my colab, to run the code -> https://colab.research.google.com/drive/1AzFWgYN8zrR_Vm2OyPZI2GsQgTsQQAVV#scrollTo=OamY5WPujn1u

Expected behavior

It should have some loss at the start, not 0.0. I also ran the similar code in other notebook at the result is the same - 0.0 at the start of training. Screenshots

Other relevant information: poetry version:
python version: Operating System: Additional tools:

Additional context

Nothing changed in my code, so maybe there are some PyTorch dependencies that caused the following behavior. Please help guys :)

Optimox commented 2 years ago

@SquareGraph We made a release yesterday with potentially breaking changes. We went from v3.x to v4.x

Can you make sure that your code is working with previous tabnet version 3.1.1 and then I can help you understand what has changed in the new version.

Optimox commented 2 years ago

The breaking change that we did is to add a warm_start so that the library is now using scikit convention.

So if your pipeline used to have multiple consecutive call to fit the behavior has changed : you now need to specifically put warm_start=True so that the model won't be trained from scratch.

SquareGraph commented 2 years ago

Hey,

Yes it's working with 3.1.1. I'll check later if it's also working with 4.0 update with what You mentioned.

SquareGraph commented 2 years ago

But while fitting the model, warm_start=True is not helping anyway :/. The problem also occurs if I generate random data in a new notebook.

Optimox commented 2 years ago

@SquareGraph what do you mean by "The problem also occurs if I generate random data in a new notebook." ?

How can you spot that training is not working with random data ?

I've been launching this script : https://www.kaggle.com/code/optimo/tabnet-original-paper-s-parameters with version 4.0 and training seems to be working just fine...

Optimox commented 2 years ago

@SquareGraph please reopen once you have more information to share about new behavior of your code.