Intel-bigdata / HiBench

HiBench is a big data benchmark suite.
Other
1.45k stars 761 forks source link

Naive Bayes incorrectly setup labels #675

Open xwu99 opened 3 years ago

xwu99 commented 3 years ago

https://github.com/Intel-bigdata/HiBench/blob/master/sparkbench/ml/src/main/scala/com/intel/sparkbench/ml/SparseNaiveBayes.scala#L104

dockey is like "/class123", dockey.substring(6).head only takes "1" as label, leads to only 10 classes whatever you set number of classes in bayes.conf (hibench.bayes.classes). Should remove ".head".