ModelOriented / forester

Trees are all you need
https://modeloriented.github.io/forester/
GNU General Public License v3.0
108 stars 14 forks source link

vctors package returning error: Can't subset columns past the end. #94

Closed RickPack closed 1 year ago

RickPack commented 1 year ago

I cannot provide the dataset that caused this error the first time so using the economics dataset from the ggplot2 package:

Error in `df[, i]`:
! Can't subset columns past the end.
ℹ Location 1 doesn't exist.
ℹ There are only 0 columns.
> packageVersion('forester')
[1] ‘1.1.4’
> packageVersion('vctrs')
[1] ‘0.5.2’

Reproducible example

library(forester) library(ggplot2)

test_model <- train( economics %>% select(-date), type = 'regression', y = 'uempmed', engine = c('ranger', 'xgboost', 'decision_tree', 'lightgbm'))


✔ Type guessed as: regression

-------------------- CHECK DATA REPORT --------------------

The dataset has 574 observations and 5 columns, which names are: pce; pop; psavert; uempmed; unemploy;

With the target value described by a column uempmed.

✔ No static columns.

✔ No duplicate columns.

✔ No target values are missing.

✔ No predictor values are missing.

✔ No issues with dimensionality.

✖ Strongly correlated, by Spearman rank, pairs of numerical values are:

pce - pop: 0.99; pce - psavert: -0.79; pop - psavert: -0.84;

✖ These obserwation migth be outliers due to their numerical columns values: 514 515 516 517 518 520 521 522 523 524 525 527 528 529 530 531 ;

✖ Target data is not evenly distributed with quantile bins: 0.24 0.45 0.06 0.26

✔ Columns names suggest that none of them are IDs.

✔ Columns data suggest that none of them are IDs.

-------------------- CHECK DATA REPORT END --------------------

Error in df[, i]: ! Can't subset columns past the end. ℹ Location 1 doesn't exist. ℹ There are only 0 columns. Run rlang::last_error() to see where the error occurred.


rlang::last_error()

<error/vctrs_error_subscript_oob> Error in df[, i]: ! Can't subset columns past the end. ℹ Location 1 doesn't exist. ℹ There are only 0 columns.

Backtrace:

  1. forester::train(...)
  2. forester::preprocessing(data, y, advanced = advanced_preprocessing)
  3. forester::manage_missing(pre_data, y)
  4. tibble:::[.tbl_df(df, , i) Run rlang::last_trace() to see the full context. rlang::last_trace() <error/vctrs_error_subscript_oob> Error in df[, i]: ! Can't subset columns past the end. ℹ Location 1 doesn't exist. ℹ There are only 0 columns.

    Backtrace: ▆

  5. ├─forester::train(...)
  6. │ └─forester::preprocessing(data, y, advanced = advanced_preprocessing)
  7. │ └─forester::manage_missing(pre_data, y)
  8. │ ├─df[, i]
  9. │ └─tibble:::[.tbl_df(df, , i)
  10. │ └─tibble:::vectbl_as_col_location(...)
  11. │ ├─tibble:::subclass_col_index_errors(...)
  12. │ │ └─base::withCallingHandlers(...)
  13. │ └─vctrs::vec_as_location(j, n, names, call = call)
  14. └─vctrs (local) <fn>()
  15. └─vctrs:::stop_subscript_oob(...)
  16. └─vctrs:::stop_subscript(...)
  17. └─rlang::abort(...)
jmanacup commented 1 year ago

Try converting the dataset to as.data.frame(economics) before passing it to the train function. I think the function does not accept tibble type.

RickPack commented 1 year ago

@jmanacup , thank you, that solved the "location 1 doesn't exist" error. Now I see an XGBoost error. label must be provided when data is a matrix

I will open another issue.