dreamquark-ai / tabnet

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf
https://dreamquark-ai.github.io/tabnet/
MIT License
2.55k stars 470 forks source link

Index -1 out of bounds in entmax calculation #483

Closed Kayne88 closed 11 months ago

Kayne88 commented 1 year ago

Describe the bug

While training the classifier after a couple backward steps a index -1 should be selected in entmax calculation, which is out of bounds for feature matrix with shape[1] = 172

What is the current behavior?

If the current behavior is a bug, please provide the steps to reproduce.

reward_matrix = np.array([
    [0.041, 0, -0.041],
    [0, 0.0041, 0],
    [-0.041, 0, 0.041]
])

reward_tensor = torch.as_tensor(reward_matrix)
def neg_reward(y_pred, y_true):
  y_t = y_true.type(torch.int32)
  p = torch.nn.Softmax(dim=1)(y_pred)
  r_selected = torch.index_select(reward_tensor, dim=1, index=y_t)
  r = torch.transpose(r_selected, dim0=0, dim1=1)
  reward = torch.sum(torch.multiply(r, torch.log(p/(1-p+1e-20))), dim=1)
  return -torch.mean(reward)

def neg_reward_regularized(y_pred, y_true):
  pass

class Reward(Metric):
  def __init__(self):
    self._name = "reward"
    self._maximize = True

  def __call__(self, y_true, y_score):
    p = softmax(y_score, axis=1)
    reward_matrix = np.array([
        [0.041, 0, -0.041],
        [0, 0.0041, 0],
        [-0.041, 0, 0.041]
    ])
    rewards = np.sum(np.transpose(reward_matrix[:,y_true])*np.log(p/(1-p+1e-20)), axis=1)
    return -np.mean(rewards)
X_num = np.random.random((2000, 171))
X_cat = np.random.choice([0,1], size=2000).reshape((-1, 1))
X = np.concatenate([X_num, X_cat], axis=1)
y = np.random.choice([0,1,2], size=2000)

cat_idxs = [171]
cat_dims = [2]
model = TabNetClassifier(
              device_name="cpu",
              cat_idxs=cat_idxs,
              cat_dims=cat_dims,
              cat_emb_dim=1,
              optimizer_fn=torch.optim.Adam, # Any optimizer works here
              optimizer_params=dict(lr=2e-2),
              scheduler_fn=torch.optim.lr_scheduler.OneCycleLR,
              scheduler_params={"is_batch_level":True,
                                "max_lr":5e-2,
                                "steps_per_epoch":int(train_set.shape[0] / batch_size)+1,
                                "epochs":10_000
                                },
              mask_type='entmax', # "sparsemax",
          )

model.fit(
    X, y,
    eval_set=[
        (X, y)],
    eval_name=["train_set"],
    max_epochs=100,
    patience=10,
    loss_fn=neg_reward,
    eval_metric=[Reward],
    batch_size=batch_size
)

Expected behavior Training should correctly work with custom loss.

Screenshots image

Other relevant information: image

Additional context When standard cross entropy is used, the training works fine. So it must have something to do with the custom loss.

Optimox commented 1 year ago

This error often appears when you have Nans in your data. If things are working properly with cross entropy this must come from your custom loss.

Maybe try to lower the learning rate or clip the gradient norm.

jbjaypark commented 1 year ago

I encountered a similar issue while working on my data. Checked if there's a Nans in my data: Nope Tried to lower the lr: still the same issue.

Optimox commented 1 year ago

Without a reproducible error I can't help much I'm afraid.