Far0n / xgbfi

XGBoost Feature Interactions & Importance
MIT License
496 stars 87 forks source link

Feature map(xgb.fmap) and XGBoost dump file(xgb.dump) upload request. #7

Closed SimonZhao777 closed 7 years ago

SimonZhao777 commented 7 years ago

Hi Far0n, I am currently using XGBoost to select feature interaction for my LR model, I find your work very interesting, however I have encountered a problem to get the XgbFeatureInteractions.exe runing with my xgb.dump file. I am using scala language and XGBoost4j package running on Spark version 1.6, so I am not sure if the XGBoost dump file that my project created is having a different format from yours, so it would be really nice if you can upload the files xgb.fmap and xgb.dump that were mentioned in README.md in your project so that I can do a format check to see what wrong with my dump file. Thank you.

SimonZhao777 commented 7 years ago

Problem solved, I have figured it out, turns out my xgb.dump lacks tree number lines. For someone who might be interested how I solved this dump file problem: Here is the code I wrote in Scala: ` def createFeatureMap(savePath:String, featureNames:Array[String]): Unit = {

val writer = new PrintWriter(savePath, "UTF-8") for (i <- 0 until featureNames.length) { writer.print(i + "\t" + featureNames(i) + "\t" + "q\n") } writer.close() }

val modelInfos = xgBoostModel.booster.getModelDump("../xgb.fmap", withStats = true)

def saveDumpModel(modelPath: String, modelInfos: Array[String]): Unit = { val writer = new PrintWriter(modelPath, "UTF-8") for (i <- 0 until modelInfos.length) { writer.print(s"booster[$i]:\n") writer.print(modelInfos(i)) } writer.close() } ` XGBDumpFile.zip

You can firstly call createFeatureMap and the feature map will be saved in specified location. Then prepare your XGBoost model dump(in the code above is modelInfos) and pass it into saveDumpModel Function then you will get a dump file that can be parsed by XGBFI.

The attached files are my created xgb.fmap and xgb.dump just to give you a reference.

Far0n commented 7 years ago

Thx for your feedback @SimonZhao777. I'll have a look at the dump and try to make xgbfi compatible with this format.

SimonZhao777 commented 7 years ago

@Far0n The attached xgb.dump file is already compatible with your solution, the reason why my former version didn't work was because the code did not contain this line "writer.print(s"booster[$i]:\n")" in function saveDumpModel, so after adding this line, the dump file is working fine with xgbfi. Thank you for work!