AdrianAntico / AutoQuant

R package for automation of machine learning, forecasting, model evaluation, and model interpretation
GNU Affero General Public License v3.0
235 stars 43 forks source link

unused arguments issue with FakeDataGenerator #77

Closed MislavSag closed 3 years ago

MislavSag commented 3 years ago

Hi,

I have just download the newest version of the package and all dependencies.

I have ran the code from the readm me:

# Create some dummy correlated data
data <- RemixAutoML::FakeDataGenerator(
  Correlation = 0.85,
  N = 10000,
  ID = 2,
  ZIP = 0,
  AddDate = FALSE,
  Classification = TRUE,
  MultiClass = FALSE)

head(data)

# Run function
TestModel <- RemixAutoML::AutoCatBoostClassifier(

  # GPU or CPU and the number of available GPUs
  task_type = "GPU",
  NumGPUs = 1,

  # Metadata args
  ModelID = "Test_Model_1",
  model_path = normalizePath("./"),
  metadata_path = normalizePath("./"),
  SaveModelObjects = FALSE,
  ReturnModelObjects = TRUE,
  SaveInfoToPDF = FALSE,

  # Data args
  data = data,
  TrainOnFull = FALSE,
  ValidationData = NULL,
  TestData = NULL,
  TargetColumnName = "Adrian",
  FeatureColNames = names(data)[!names(data) %in% c("IDcol_1","IDcol_2","Adrian")],
  PrimaryDateColumn = NULL,
  ClassWeights = c(1L,1L),
  IDcols = c("IDcol_1","IDcol_2"),

  # Evaluation args
  eval_metric = "AUC",
  loss_function = "Logloss",
  grid_eval_metric = "MCC",
  MetricPeriods = 10L,
  NumOfParDepPlots = ncol(data)-1L-2L,

  # Grid tuning args
  PassInGrid = NULL,
  GridTune = TRUE,
  MaxModelsInGrid = 30L,
  MaxRunsWithoutNewWinner = 20L,
  MaxRunMinutes = 24L*60L,
  BaselineComparison = "default",

  # ML args
  Trees = seq(100L, 500L, 50L),
  Depth = seq(4L, 8L, 1L),
  LearningRate = seq(0.01,0.10,0.01),
  L2_Leaf_Reg = seq(1.0, 10.0, 1.0),
  RandomStrength = 1,
  BorderCount = 128,
  RSM = c(0.80, 0.85, 0.90, 0.95, 1.0),
  BootStrapType = c("Bayesian", "Bernoulli", "Poisson", "MVS", "No"),
  GrowPolicy = c("SymmetricTree", "Depthwise", "Lossguide"),
  langevin = FALSE,
  diffusion_temperature = 10000,
  model_size_reg = 0.5,
  feature_border_type = "GreedyLogSum",
  sampling_unit = "Group",
  subsample = NULL,
  score_function = "Cosine",
  min_data_in_leaf = 1)

but I got an error:

Error in RemixAutoML::AutoCatBoostClassifier(task_type = "GPU", NumGPUs = 1,  : 
  unused arguments (eval_metric = "AUC", loss_function = "Logloss")

Than, when I comment those two arguments I got an error:

AdrianAntico commented 3 years ago

Hi @MislavSag

Thanks for bringing this up. It's been a while since I updated the README code for a few of these functions. Looks like the CatBoost and XGBoost regression, classifier, and multiclass all needed updating. I just pushed the updates so that the readme code matches the help file code. You should be good to go for copying it in and running it.

MislavSag commented 3 years ago

Running RemixAutoML::AutoCatBoostClassifier crashed all my active R sessions. Do you know what could be the reason?

AdrianAntico commented 3 years ago

@MislavSag It's CatBoost. It seems like their newest release is causing that. I haven't investigated enough to figure out why that's happening. I know they mentioned that they require a newer version of cuda / drivers for GPU and that was the only thing I tested and it crashed. Not sure if there are also CPU training issues, but the prior version of catboost works without error.

devtools::install_url('https://github.com/catboost/catboost/releases/download/v0.25.1/catboost-R-Windows-0.25.1.tgz', INSTALL_opts = c("--no-multiarch", "--no-test-load"))