dvgodoy / handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes
MIT License
185 stars 23 forks source link

getMetricsByThreshold is failing #30

Closed MojiFarmanbar closed 1 year ago

MojiFarmanbar commented 1 year ago

This method is failing because of TypeError in eveluation.py line 142. Screenshot 2023-05-19 at 21 37 05

I believe in lines 141 and 142, select(scoreCol, labelCol).rdd.map((lambda row:(float(row [scoreCol][1]) , float(row[labelCol]))) should change to the following: select(scoreCol, labelCol).rdd.map((lambda row:(float(row [scoreCol]) , float(row[labelCol])))

MojiFarmanbar commented 1 year ago

The API does what it has to do. The output of a Binaryclassifier in spark.ml spark is a DenseVector which socreCol[1] refers to the probability that the data belong to class 1.