dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.56k stars 473 forks source link

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object #440

Closed sonnguyen129 closed 1 year ago

sonnguyen129 commented 1 year ago

Describe the bug

When I tried to train TabNet with Optuna, I cause this error.

What is the current behavior?

Namespace(batch_size=128, cat_dims=[9, 16, 7, 15, 6, 5, 2, 42], cat_idx=[1, 3, 5, 6, 7, 8, 9, 13], config='config/adult.yml', data_parallel=True, dataset='Adult', direction='maximize', early_stopping_rounds=20, epochs=1000, gpu_ids=[0], logging_period=100, model_name='TabNet', n_trials=10, num_classes=1, num_features=14, num_splits=5, objective='binary', one_hot_encode=False, optimize_hyperparameters=True, scale=True, seed=221, shuffle=True, target_encode=True, use_gpu=True, val_batch_size=256)
Start hyperparameter optimization
Loading dataset Adult...
Dataset loaded!
(32561, 14)
Scaling the data...
[I 2022-10-05 19:09:12,970] A new study created in RDB with name: TabNet_Adult
A new study created in RDB with name: TabNet_Adult
/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py:75: UserWarning: Device used : cuda
  warnings.warn(f"Device used : {self.device}")
/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py:75: UserWarning: Device used : cuda
  warnings.warn(f"Device used : {self.device}")
epoch 0  | loss: 0.68711 | eval_logloss: 1.5043  |  0:00:20s
epoch 1  | loss: 0.39683 | eval_logloss: 0.4977  |  0:00:41s
epoch 2  | loss: 0.38015 | eval_logloss: 0.43938 |  0:01:01s
epoch 3  | loss: 0.36482 | eval_logloss: 0.43644 |  0:01:22s
epoch 4  | loss: 0.34721 | eval_logloss: 0.38523 |  0:01:43s
epoch 5  | loss: 0.34573 | eval_logloss: 0.35584 |  0:02:03s
epoch 6  | loss: 0.34037 | eval_logloss: 0.38542 |  0:02:23s
epoch 7  | loss: 0.33787 | eval_logloss: 0.35565 |  0:02:44s
epoch 8  | loss: 0.32982 | eval_logloss: 0.35525 |  0:03:04s
epoch 9  | loss: 0.32862 | eval_logloss: 0.33821 |  0:03:24s
epoch 10 | loss: 0.32244 | eval_logloss: 0.33319 |  0:03:45s
epoch 11 | loss: 0.32608 | eval_logloss: 0.34302 |  0:04:06s
epoch 12 | loss: 0.3276  | eval_logloss: 0.36721 |  0:04:26s
epoch 13 | loss: 0.32269 | eval_logloss: 0.3386  |  0:04:47s
epoch 14 | loss: 0.32002 | eval_logloss: 0.33012 |  0:05:08s
epoch 15 | loss: 0.31808 | eval_logloss: 0.33689 |  0:05:28s
epoch 16 | loss: 0.31916 | eval_logloss: 0.32849 |  0:05:49s
epoch 17 | loss: 0.31616 | eval_logloss: 0.34039 |  0:06:10s
epoch 18 | loss: 0.31717 | eval_logloss: 0.34637 |  0:06:30s
epoch 19 | loss: 0.31554 | eval_logloss: 0.33508 |  0:06:50s
epoch 20 | loss: 0.318   | eval_logloss: 0.43872 |  0:07:11s
epoch 21 | loss: 0.32983 | eval_logloss: 0.49745 |  0:07:31s
epoch 22 | loss: 0.31808 | eval_logloss: 0.33653 |  0:07:52s
epoch 23 | loss: 0.31731 | eval_logloss: 0.32934 |  0:08:12s
epoch 24 | loss: 0.31352 | eval_logloss: 0.33776 |  0:08:32s
epoch 25 | loss: 0.31438 | eval_logloss: 0.34476 |  0:08:53s
epoch 26 | loss: 0.31483 | eval_logloss: 0.3282  |  0:09:13s
epoch 27 | loss: 0.30911 | eval_logloss: 0.32267 |  0:09:33s
epoch 28 | loss: 0.31008 | eval_logloss: 0.34737 |  0:09:53s
epoch 29 | loss: 0.30756 | eval_logloss: 0.32561 |  0:10:13s
epoch 30 | loss: 0.30834 | eval_logloss: 0.32646 |  0:10:33s
epoch 31 | loss: 0.30615 | eval_logloss: 0.32435 |  0:10:53s
epoch 32 | loss: 0.30466 | eval_logloss: 0.33857 |  0:11:13s
epoch 33 | loss: 0.30495 | eval_logloss: 0.33067 |  0:11:33s
epoch 34 | loss: 0.30485 | eval_logloss: 0.33315 |  0:11:53s
epoch 35 | loss: 0.30466 | eval_logloss: 0.33724 |  0:12:13s
epoch 36 | loss: 0.30336 | eval_logloss: 0.33496 |  0:12:33s
epoch 37 | loss: 0.29928 | eval_logloss: 0.35852 |  0:12:53s
epoch 38 | loss: 0.29941 | eval_logloss: 0.33168 |  0:13:13s
epoch 39 | loss: 0.30065 | eval_logloss: 0.34095 |  0:13:33s
epoch 40 | loss: 0.29873 | eval_logloss: 0.35759 |  0:13:53s
epoch 41 | loss: 0.30008 | eval_logloss: 0.35994 |  0:14:13s
epoch 42 | loss: 0.29637 | eval_logloss: 0.33748 |  0:14:33s
epoch 43 | loss: 0.29404 | eval_logloss: 0.33582 |  0:14:54s
epoch 44 | loss: 0.29512 | eval_logloss: 0.33685 |  0:15:13s
epoch 45 | loss: 0.29254 | eval_logloss: 0.34174 |  0:15:33s
epoch 46 | loss: 0.29284 | eval_logloss: 0.35136 |  0:15:53s
epoch 47 | loss: 0.2898  | eval_logloss: 0.35115 |  0:16:13s

Early stopping occurred at epoch 47 with best_epoch = 27 and best_eval_logloss = 0.32267
/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/callbacks.py:172: UserWarning: Best weights from best epoch are automatically used!
  warnings.warn(wrn_msg)
[W 2022-10-05 19:25:28,854] Trial 0 failed because of the following error: TypeError('default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "train.py", line 95, in __call__
    sc, time = cross_validation(model, self.X, self.y, self.args)
  File "train.py", line 41, in cross_validation
    loss_history, val_loss_history = curr_model.fit(X_train, y_train, X_test, y_test)  # X_val, y_val)
  File "/content/drive/MyDrive/Predict Student Results/Code/TabSurvey-main/models/tabnet.py", line 40, in fit
    batch_size=self.args.batch_size)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 260, in fit
    self.feature_importances_ = self._compute_feature_importances(X_train)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 723, in _compute_feature_importances
    M_explain, _ = self.explain(X, normalize=False)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 320, in explain
    for batch_nb, data in enumerate(dataloader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 147, in default_collate
    raise TypeError(default_collate_err_msg_format.format(elem.dtype))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object
Trial 0 failed because of the following error: TypeError('default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object')
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "train.py", line 95, in __call__
    sc, time = cross_validation(model, self.X, self.y, self.args)
  File "train.py", line 41, in cross_validation
    loss_history, val_loss_history = curr_model.fit(X_train, y_train, X_test, y_test)  # X_val, y_val)
  File "/content/drive/MyDrive/Predict Student Results/Code/TabSurvey-main/models/tabnet.py", line 40, in fit
    batch_size=self.args.batch_size)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 260, in fit
    self.feature_importances_ = self._compute_feature_importances(X_train)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 723, in _compute_feature_importances
    M_explain, _ = self.explain(X, normalize=False)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 320, in explain
    for batch_nb, data in enumerate(dataloader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 147, in default_collate
    raise TypeError(default_collate_err_msg_format.format(elem.dtype))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object
Traceback (most recent call last):
  File "train.py", line 144, in <module>
    main(arguments)
  File "train.py", line 116, in main
    study.optimize(Objective(args, model_name, X, y), n_trials=args.n_trials)
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/study.py", line 428, in optimize
    show_progress_bar=show_progress_bar,
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 76, in _optimize
    progress_bar=progress_bar,
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 160, in _optimize_sequential
    frozen_trial = _run_trial(study, func, catch)
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 234, in _run_trial
    raise func_err
  File "/usr/local/lib/python3.7/dist-packages/optuna/study/_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
  File "train.py", line 95, in __call__
    sc, time = cross_validation(model, self.X, self.y, self.args)
  File "train.py", line 41, in cross_validation
    loss_history, val_loss_history = curr_model.fit(X_train, y_train, X_test, y_test)  # X_val, y_val)
  File "/content/drive/MyDrive/Predict Student Results/Code/TabSurvey-main/models/tabnet.py", line 40, in fit
    batch_size=self.args.batch_size)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 260, in fit
    self.feature_importances_ = self._compute_feature_importances(X_train)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 723, in _compute_feature_importances
    M_explain, _ = self.explain(X, normalize=False)
  File "/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py", line 320, in explain
    for batch_nb, data in enumerate(dataloader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 721, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/collate.py", line 147, in default_collate
    raise TypeError(default_collate_err_msg_format.format(elem.dtype))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object

If the current behavior is a bug, please provide the steps to reproduce.

Here is my TabNet:

from pytorch_tabnet.tab_model import TabNetClassifier, TabNetRegressor
import numpy as np
import torch
from models.basemodel_torch import BaseModelTorch
from utils.io_utils import save_model_to_file, load_model_from_file

class TabNet(BaseModelTorch):

    def __init__(self, params, args):
        super().__init__(params, args)

        # Paper recommends to be n_d and n_a the same
        self.params["n_a"] = self.params["n_d"]

        self.params["cat_idxs"] = args.cat_idx
        self.params["cat_dims"] = args.cat_dims

        self.params["device_name"] = self.device

        if args.objective == "regression":
            self.model = TabNetRegressor(**self.params)
            self.metric = ["rmse"]
        elif args.objective == "classification" or args.objective == "binary":
            self.model = TabNetClassifier(**self.params)
            self.metric = ["logloss"]

    def fit(self, X, y, X_val=None, y_val=None):
        if self.args.objective == "regression":
            y, y_val = y.reshape(-1, 1), y_val.reshape(-1, 1)

        self.model.fit(X, y, eval_set=[(X_val, y_val)], eval_name=["eval"], eval_metric=self.metric,
                       max_epochs=self.args.epochs, patience=self.args.early_stopping_rounds,
                       batch_size=self.args.batch_size)
        history = self.model.history
        self.save_model(filename_extension="best")
        return history['loss'], history["eval_" + self.metric[0]]

    def predict_helper(self, X):
        X = np.array(X, dtype=np.float)

        if self.args.objective == "regression":
            return self.model.predict(X)
        elif self.args.objective == "classification" or self.args.objective == "binary":
            return self.model.predict_proba(X)

    def save_model(self, filename_extension=""):
        save_model_to_file(self.model, self.args, filename_extension)

    def load_model(self, filename_extension=""):
        self.model = load_model_from_file(self.model, self.args, filename_extension)

    def get_model_size(self):
        # To get the size, the model has be trained for at least one epoch
        model_size = sum(t.numel() for t in self.model.network.parameters() if t.requires_grad)
        return model_size

    @classmethod
    def define_trial_parameters(cls, trial, args):
        params = {
            "n_d": trial.suggest_int("n_d", 8, 64),
            "n_steps": trial.suggest_int("n_steps", 3, 10),
            "gamma": trial.suggest_float("gamma", 1.0, 2.0),
            "cat_emb_dim": trial.suggest_int("cat_emb_dim", 1, 3),
            "n_independent": trial.suggest_int("n_independent", 1, 5),
            "n_shared": trial.suggest_int("n_shared", 1, 5),
            "momentum": trial.suggest_float("momentum", 0.001, 0.4, log=True),
            "mask_type": trial.suggest_categorical("mask_type", ["sparsemax", "entmax"]),
        }
        return params

Here is the training code

def cross_validation(model, X, y, args, save_model=False):
    # Record some statistics and metrics
    sc = get_scorer(args)
    train_timer = Timer()
    test_timer = Timer()

    if args.objective == "regression":
        kf = KFold(n_splits=args.num_splits, shuffle=args.shuffle, random_state=args.seed)
    elif args.objective == "classification" or args.objective == "binary":
        kf = StratifiedKFold(n_splits=args.num_splits, shuffle=args.shuffle, random_state=args.seed)
    else:
        raise NotImplementedError("Objective" + args.objective + "is not yet implemented.")

    for i, (train_index, test_index) in enumerate(kf.split(X, y)):

        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        # X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.05, random_state=args.seed)

        # Create a new unfitted version of the model
        curr_model = model.clone()

        # Train model
        train_timer.start()
        loss_history, val_loss_history = curr_model.fit(X_train, y_train, X_test, y_test)  # X_val, y_val)
        train_timer.end()

        # Test model
        test_timer.start()
        curr_model.predict(X_test)
        test_timer.end()

        # Save model weights and the truth/prediction pairs for traceability
        curr_model.save_model_and_predictions(y_test, i)

        if save_model:
            save_loss_to_file(args, loss_history, "loss", extension=i)
            save_loss_to_file(args, val_loss_history, "val_loss", extension=i)

        # Compute scores on the output
        sc.eval(y_test, curr_model.predictions, curr_model.prediction_probabilities)

        print(sc.get_results())

    # Best run is saved to file
    if save_model:
        print("Results:", sc.get_results())
        print("Train time:", train_timer.get_average_time())
        print("Inference time:", test_timer.get_average_time())

        # Save the all statistics to a file
        save_results_to_file(args, sc.get_results(),
                             train_timer.get_average_time(), test_timer.get_average_time(),
                             model.params)

    # print("Finished cross validation")
    return sc, (train_timer.get_average_time(), test_timer.get_average_time())

class Objective(object):
    def __init__(self, args, model_name, X, y):
        # Save the model that will be trained
        self.model_name = model_name

        # Save the trainings data
        self.X = X
        self.y = y

        self.args = args

    def __call__(self, trial):
        # Define hyperparameters to optimize
        trial_params = self.model_name.define_trial_parameters(trial, self.args)

        # Create model
        model = self.model_name(trial_params, self.args)

        # Cross validate the chosen hyperparameters
        sc, time = cross_validation(model, self.X, self.y, self.args)

        save_hyperparameters_to_file(self.args, trial_params, sc.get_results(), time)

        return sc.get_objective_result()

def main(args):
    print("Start hyperparameter optimization")
    X, y = load_data(args)

    model_name = str2model(args.model_name)

    optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
    study_name = args.model_name + "_" + args.dataset
    storage_name = "sqlite:///{}.db".format(study_name)

    study = optuna.create_study(direction=args.direction,
                                study_name=study_name,
                                storage=storage_name,
                                load_if_exists=True)
    study.optimize(Objective(args, model_name, X, y), n_trials=args.n_trials)
    print("Best parameters:", study.best_trial.params)

    # Run best trial again and save it!
    model = model_name(study.best_trial.params, args)
    cross_validation(model, X, y, args, save_model=True)

Expected behavior

Screenshots

Other relevant information: poetry version:
python version: 3.8.10 Operating System: Additional tools:

Additional context

Hope to hear from you soon

Optimox commented 1 year ago

Here is a link to a notebook using tabnet and Optuna : https://www.kaggle.com/code/neilgibbons/tuning-tabnet-with-optuna

Your code does not use the library in a straight forward way so I'm not sure I will be able to help you more than that.

sonnguyen129 commented 1 year ago

Hi @Optimox , I solved it with X_train = X_train.astype(float). Can you explain it?

Why this warning in log is appear so much?

/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/abstract_model.py:75: UserWarning: Device used : cuda
  warnings.warn(f"Device used : {self.device}")
/usr/local/lib/python3.7/dist-packages/pytorch_tabnet/callbacks.py:172: UserWarning: Best weights from best epoch are automatically used!
  warnings.warn(wrn_msg)
Optimox commented 1 year ago

I guess numpy array can sometimes contain mixed types which can cause errors when switching to torch tensors...

About the indicative warnings: I think it's helpful to understand what is going on by default, I know it upsets some people. I don't really know if it is a good or bad idea to have put them in the first place. In doubt, I'm leaving them for the moment.