dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.56k stars 473 forks source link

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object #399

Closed SchererM99 closed 2 years ago

SchererM99 commented 2 years ago

Describe the bug

After training the model, when loading it and using the predict method I get TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object

What is the current behavior? Using either unseen test data or the data used for training the error above shows up, no changes to the data where made inbetween training and prediciton.

If the current behavior is a bug, please provide the steps to reproduce.

This could be difficult to reproduce, as I think it depends on the data. The network is trained on a (319912, 53) dataset, consisting of integer and boolean columns. The test data is similar, but smaller.

  1. Train the network on this data
  2. Save it
  3. Load it
  4. Use the predict method on the test data
  5. TypeError occurs

Expected behavior

The method should return the predictions of the model instead.

Screenshots

image image

Other relevant information: python version: 3.10 Operating System: Windows 10 Additional tools: PyCharm, Jupyter Notebook

Additional context

-

eduardocarvp commented 2 years ago

Hi @SchererM99 ,

Did you save your model using the method save_model as in:

saving_path_name = "./tabnet_model_test_1"
saved_filepath = clf.save_model(saving_path_name)

where clf is a TabNetClassifier?

SchererM99 commented 2 years ago

Hi @eduardocarvp,

Yes, I followed the part about saving and loading directly from the Readme.

Optimox commented 2 years ago

@SchererM99,

I'd like to help but without any minimal reproducible code it's hard. Moreover, you can find dozens of counter examples everywhere showing that the expected behavior you are asking for is actually working. It's even tested in the CI of the repo.

So the real problem probably comes from your code and not the library.

Please check the size of the numpy array you are giving for prediction, it should be (N_samples, N_features)

Optimox commented 2 years ago

@SchererM99 do you have more information to share ? a short script to reproduce the error ? more info on your data ? a code sample ?

SchererM99 commented 2 years ago

@Optimox Yes, I wrote a short script which produces the error for me. I also shared the dataframes and the model below. Thank you for your help!

import pandas as pd
from pytorch_tabnet.tab_model import TabNetClassifier

X_test = pd.read_csv("xtestp94.csv")
y_test = pd.read_csv("ytestp94.csv")
X_test = X_test.drop(columns="Unnamed: 0")
y_test = y_test.drop(columns="Unnamed: 0")
X_test_tabnet = X_test[X_test.columns].values
y_test_tabnet = y_test["decision"].values

best_net = TabNetClassifier()
best_net.load_model("models/steps_7_gamma_1.5_indepglus_4_sharedglus_1_lambdasparse_0.001.zip")

y_pred = best_net.predict(X_test_tabnet)

steps_7_gamma_1.5_indepglus_4_sharedglus_1_lambdasparse_0.001.zip xtestp94.csv ytestp94.csv

Optimox commented 2 years ago

Have you tried giving a float numpy matrix to the model ? Like this X_test_tabnet = X_test.values.astype(float) ?

Optimox commented 2 years ago

@SchererM99 feel free to reopen if type float does not solve your problem.