Closed dcarrion87 closed 1 year ago
Describe the bug
I've been testing a few networks with my data and finding TabNetRegressor predictions are wildly different to RandomForestGenerator and a basic PyTorch Linear Regression network.
Code looks like this:
train_data = pd.read_excel(config.TRAIN_DATA_FILE) validation_data = pd.read_excel(config.VALIDATION_DATA_FILE) X_train = train_data.drop(columns=['Months','ID']) X_val = validation_data.drop(columns=['Months','ID']) y_train_mth = train_data['Months'] y_val_mth = validation_data['Months] imputer = SimpleImputer(strategy='mean') X_train = imputer.fit_transform(X_train) X_val = imputer.transform(X_val) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_val = scaler.transform(X_val) y_train_mth = y_train_mth.values.reshape(-1, 1) y_val_mth = y_val_mth.values.reshape(-1, 1) regressor = TabNetRegressor(verbose=1,seed=42) regressor.fit( X_train=X_train, y_train=y_train_mth, eval_set=[(X_val, y_val_mth)], virtual_batch_size=64, eval_metric=['rmse'], ) y_val_pred = regressor.predict(X_val) for true_val, pred_val in zip(y_val_mth, y_val_pred): print(f"True: {true_val[0]}, Predicted: {pred_val[0]}")
Data looks like this:
ID Months Max Volume A1 A2 A3 A4 A5 A... 1 20.47 7.26346 488601 9.99133 15.7748 4.87628 2.38E+06 41 2 89.23 15.4819 101610 16.0093 22.9652 8.06708 819696 3 3 24.57 4.18762 26165.2 5.00004 5.83497 4.4945 117598 7
What is the current behavior?
The output looks like this and the predicted values are wildly wrong.
epoch 0 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 1 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 2 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 3 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 4 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 5 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 6 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 7 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 8 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 9 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s epoch 10 | loss: 0.0 | val_0_rmse: 52.95739| 0:00:00s Early stopping occurred at epoch 10 with best_epoch = 0 and best_val_0_rmse = 52.95739 .../pytorch_tabnet/callbacks.py:172: UserWarning: Best weights from best epoch are automatically used! warnings.warn(wrn_msg) True: 12.98, Predicted: -0.04269159585237503 True: 67.55, Predicted: -0.0058983564376831055 True: 56.64, Predicted: -0.4818570613861084 True: 9.03, Predicted: 0.05411398410797119 True: 54.01, Predicted: -0.0857810527086258
Expected behavior
Should look closer to:
True: 12.98, Predicted: 30.733763574218806 True: 67.55, Predicted: 58.54040414611832 True: 56.64, Predicted: 60.1098913061525 True: 9.03, Predicted: 16.965372472534174 True: 54.01, Predicted: 64.88073784667964
Thanks for any insight!
Nevermind, setting batch_size fixed it!
Describe the bug
I've been testing a few networks with my data and finding TabNetRegressor predictions are wildly different to RandomForestGenerator and a basic PyTorch Linear Regression network.
Code looks like this:
Data looks like this:
What is the current behavior?
The output looks like this and the predicted values are wildly wrong.
Expected behavior
Should look closer to:
Thanks for any insight!