Closed zachmayer closed 7 years ago
I think the example is outdated, not sure, I have to check that. I am going to add the code below as the example in my next push.
Try this (change the working directory on line 8 and LGBM on line 32):
library(Laurae)
library(stringi)
library(Matrix)
library(sparsity)
library(data.table)
remove(list = ls()) # WARNING: CLEANS EVERYTHING IN THE ENVIRONMENT
setwd("D:/Data Science/HousePrices") # CHANGE THIS TO WHATEVER TEMPORARY DIRECTORY WHERE YOU WANT TEMPORARY FILES
DT <- data.table(Split1 = c(rep(0, 50), rep(1, 50)), Split2 = rep(c(rep(0, 25), rep(0.5, 25)), 2))
DT$Split3 <- rep(c(rep(0, 10), rep(0.25, 15)), 4)
DT$Split4 <- rep(c(rep(0, 5), rep(0.1, 5), rep(0, 5), rep(0.1, 10)), 4)
DT$Split5 <- rep(c(rep(0, 5), rep(0.05, 5), rep(0, 10), rep(0.05, 5)), 4)
label <- c(rep(0, 25), rep(1, 25), rep(0, 25), rep(1, 25))
label <- as.numeric((DT$Split2 == 0) & (DT$Split1 == 0) & (DT$Split3 == 0))
label <- as.numeric((DT$Split2 == 0) & (DT$Split1 == 0) & (DT$Split3 == 0) & (DT$Split4 == 0) | ((DT$Split2 == 0.5) & (DT$Split1 == 1) & (DT$Split3 == 0.25) & (DT$Split4 == 0.1) & (DT$Split5 == 0)) | ((DT$Split1 == 0) & (DT$Split2 == 0.5)))
trained <- lgbm.cv(y_train = label,
x_train = DT,
bias_train = NA,
folds = 5,
unicity = TRUE,
application = "binary",
num_iterations = 1,
early_stopping_rounds = 1,
learning_rate = 5,
num_leaves = 16,
min_data_in_leaf = 1,
min_sum_hessian_in_leaf = 1,
tree_learner = "serial",
num_threads = 1,
lgbm_path = "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe",
workingdir = file.path(getwd()),
validation = FALSE,
files_exist = FALSE,
verbose = TRUE,
is_training_metric = TRUE,
save_binary = TRUE,
metric = "binary_logloss")
str(trained)
I am getting this output:
***************
Fold no: 1 / 5
***************
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe
Working directory of LightGBM: D:/Data Science/HousePrices/temp
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:44 PM
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000138 seconds
[LightGBM] [Info] Number of postive:27, number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000052 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded
[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:45 PM
***************
Fold no: 2 / 5
***************
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe
Working directory of LightGBM: D:/Data Science/HousePrices/temp
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:45 PM
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000140 seconds
[LightGBM] [Info] Number of postive:27, number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000076 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded
[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:46 PM
***************
Fold no: 3 / 5
***************
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe
Working directory of LightGBM: D:/Data Science/HousePrices/temp
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:47 PM
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000151 seconds
[LightGBM] [Info] Number of postive:27, number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000050 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded
[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:48 PM
***************
Fold no: 4 / 5
***************
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe
Working directory of LightGBM: D:/Data Science/HousePrices/temp
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:48 PM
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000135 seconds
[LightGBM] [Info] Number of postive:27, number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000070 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded
[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:49 PM
***************
Fold no: 5 / 5
***************
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe
Working directory of LightGBM: D:/Data Science/HousePrices/temp
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:49 PM
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000138 seconds
[LightGBM] [Info] Number of postive:27, number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000055 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded
[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:50 PM
and
List of 3
$ Models :List of 5
..$ 1:List of 8
.. ..$ Model : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
.. ..$ Path : chr "D:/Data Science/HousePrices/temp"
.. ..$ Name : chr "lgbm_model.txt"
.. ..$ lgbm : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
.. ..$ Train : chr "lgbm_train.csv"
.. ..$ Valid : chr "lgbm_val.csv"
.. ..$ Test : logi NA
.. ..$ Validation: num [1:20] 1 1 1 1 1 ...
..$ 2:List of 8
.. ..$ Model : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
.. ..$ Path : chr "D:/Data Science/HousePrices/temp"
.. ..$ Name : chr "lgbm_model.txt"
.. ..$ lgbm : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
.. ..$ Train : chr "lgbm_train.csv"
.. ..$ Valid : chr "lgbm_val.csv"
.. ..$ Test : logi NA
.. ..$ Validation: num [1:20] 1 1 1 1 1 ...
..$ 3:List of 8
.. ..$ Model : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
.. ..$ Path : chr "D:/Data Science/HousePrices/temp"
.. ..$ Name : chr "lgbm_model.txt"
.. ..$ lgbm : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
.. ..$ Train : chr "lgbm_train.csv"
.. ..$ Valid : chr "lgbm_val.csv"
.. ..$ Test : logi NA
.. ..$ Validation: num [1:20] 1 1 1 1 1 ...
..$ 4:List of 8
.. ..$ Model : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
.. ..$ Path : chr "D:/Data Science/HousePrices/temp"
.. ..$ Name : chr "lgbm_model.txt"
.. ..$ lgbm : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
.. ..$ Train : chr "lgbm_train.csv"
.. ..$ Valid : chr "lgbm_val.csv"
.. ..$ Test : logi NA
.. ..$ Validation: num [1:20] 1 1 1 1 1 ...
..$ 5:List of 8
.. ..$ Model : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
.. ..$ Path : chr "D:/Data Science/HousePrices/temp"
.. ..$ Name : chr "lgbm_model.txt"
.. ..$ lgbm : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
.. ..$ Train : chr "lgbm_train.csv"
.. ..$ Valid : chr "lgbm_val.csv"
.. ..$ Test : logi NA
.. ..$ Validation: num [1:20] 1 1 1 1 1 ...
$ Validation:List of 2
..$ : num [1:100] 1 1 1 1 1 ...
..$ :List of 5
.. ..$ : num [1:20] 1 1 1 1 1 ...
.. ..$ : num [1:20] 1 1 1 1 1 ...
.. ..$ : num [1:20] 1 1 1 1 1 ...
.. ..$ : num [1:20] 1 1 1 1 1 ...
.. ..$ : num [1:20] 1 1 1 1 1 ...
$ Weights : num [1:5] 0.2 0.2 0.2 0.2 0.2
Thanks!
(You can close this if you want or leave it open)
Another (potentially silly) question: If I followed the installation guide in the readme for linux, what might my lightgbm path be?
I fixed the LightGBM functions' documentation in commit @4fe8e2b35acabbe8979cd3181dca8f004a03ee38.
Another (potentially silly) question: If I followed the installation guide in the readme for linux, what might my lightgbm path be?
Your LightGBM should be on the same directory as your LightGBM download.
You can find out where it has been compiled using this on your LightGBM path:
ls -d */
If you installed in a folder named "(...)/LightGBM" path, it should the lgbm_path
should be "(...)/LightGBM/lightgbm" (unless my memory is wrong - it must create the executable in the root directory of the folder - you do not need to specify the extension, the shell takes automatically care of it).
I didn't even have lightgbm installed! lol. So for future reference, this error means lightgbm isn't installed, or you're pointing at the wrong path:
Error in outputs[["Models"]][[i]][["Validation"]] :
subscript out of bounds
I also got this by omitting the path.
***************
Fold no: 1 / 5
***************
Error in outputs[["Models"]][[i]][["Validation"]] :
subscript out of bounds
I installed on OS X as shown here...
cannot install lightgbm in R with devtools on macOS
Doing the R install as shown there with...
R CMD INSTALL --build . --no-multiarch
I believe this installs to the default R package location as shown by...
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"
system("ls -l /Library/Frameworks/R.framework/Versions/3.4/Resources/library/lightgbm")
total 32
-rw-rw-r-- 1 mjh admin 2027 Jun 23 17:48 DESCRIPTION
-rw-rw-r-- 1 mjh admin 2044 Jun 23 17:50 INDEX
Might it be possible to make the .libPaths()
location the default path?
I just tried...
lgbm_path = '/Library/Frameworks/R.framework/Versions/3.4/Resources/library',
and got...
***************
Fold no: 1 / 5
***************
done (actual nth=1, anyBufferGrown=no, maxBuffUsed=35%)
Saving validation data (data.table) file to: /Users/mjh/ml/kaggle/HomeCredit/code/lgbm_val_1.csv
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 3 12 12 12 12 3 5 5 5 5 12 12 12 12 12 5 3 5 5 3 5 3 3 12 5 3 3 12 3 12 ... 5 5 5 5 3 5 5 5 5 5
maxLineLen=1559 from sample. Found in 0.016s
Writing column names ... done in 0.000s
Writing 61502 rows in 23 batches of 2690 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=35%)
Starting to work on model as of Tue Jun 26 2018 08:46:11
/bin/sh: /Library/Frameworks/R.framework/Versions/3.4/Resources/library: is a directory
Model completed, results saved in /Users/mjh/ml/kaggle/HomeCredit/code
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file '/Users/mjh/ml/kaggle/HomeCredit/code/lgbm_model_1.txt': No such file or directory
It successfully wrote the .conf and train_1.csv and val_1.csv files. I'm not sure waht the other errors are about where it appears to look for a /bin/sh type executable or has the connection failure with no model_1.txt.
The lgbm_path in mac was the location of unix executable that you build from source. In my case I had it in my downloads folder so the lgbm_path value would be something like "/Downloads/LightGBM/lightgbm"
I'm having trouble figuring out the example in
?lgbm.cv
. It looks like it's on the housing price dataset, but I'm not 100% sure. When I try to run it, I get the following error:Do you have a working example that runs on your machine I could try out, to make sure my installation is working?