InseadDataAnalytics / INSEADAnalytics

Other
122 stars 1.31k forks source link

XGboost issue #150

Open dandriopoulos opened 6 years ago

dandriopoulos commented 6 years ago

We attempted to implement the XGBoost algorithm to model imdb ratings. Following is the code that I used:

imdb_data_xg<-read.csv(file.choose()) imdb_data_xg_label<-read.csv(file.choose())

imdb_data_xg=imdb_data_xg[1:6]

imdbXG=as.matrix(imdb_data_xg) imdbLABEL=as.matrix(imdb_data_xg_label)

set.seed(77850) #set a random number generation seed to ensure that the split is the same everytime

inTrain3 <- createDataPartition(y = imdb_data_xg$Revenue_MM,p = 0.9, list = FALSE) training3 <- imdbXG[ inTrain3,] testing3 <- imdbXG[ -inTrain3,] str(training3)

dtrain <- xgb.DMatrix(data = training3, label = imdbLABEL)

bst <- xgboost(data = training3, label = imdbLABEL, max.depth = 10, eta = 1, nthread = 2, nround = 10, objective = "binary:logistic")

The error that I get is: dtrain <- xgb.DMatrix(data = training3, label = imdbLABEL) Error in setinfo.xgb.DMatrix(dmat, names(p), p[[1]]) : The length of labels must equal to the number of rows in the input data

Any input?