DrPepper8888 / IPMN

0 stars 0 forks source link

逻辑回归的评估(缺少lift chart) #18

Open DrPepper8888 opened 3 months ago

DrPepper8888 commented 3 months ago

混淆矩阵(confusion matrix)、分类报告(classification report)以及ROC AUC评估

1. 混淆矩阵(Confusion Matrix)

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# 假设predictions是模型的预测结果DataFrame
evaluator = MulticlassClassificationEvaluator(labelCol="trueLabel", predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(predictions)

# 获取每个类别的计数
confusion_matrix = predictions.groupBy("trueLabel", "prediction").count()

# 打印混淆矩阵
confusion_matrix.show(truncate=False)

2. 分类报告(Classification Report)

from pyspark.sql.functions import udf
from pyspark.sql.types import DoubleType

# 定义UDF来计算精确度、召回率和F1分数
def classification_report(tp, tn, fp, fn):
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    recall = tp / (tp + fn) if (tp + fn) > 0 else 0
    f1 = 2 * precision * recall / (precision + recall) if (precision + recall) > 0 else 0
    return [precision, recall, f1]

report_udf = udf(classification_report, [DoubleType(), DoubleType(), DoubleType(), DoubleType()])

# 计算并添加精确度、召回率和F1分数到DataFrame
predictions_with_metrics = predictions.withColumn("metrics", report_udd(*predictions_with_features.select("TP", "TN", "FP", "FN").collect()))

# 打印分类报告
predictions_with_metrics.select("label", "metrics").show(truncate=False)

3. ROC AUC

from pyspark.ml.evaluation import BinaryClassificationEvaluator

# 假设binary_predictions是二分类模型的预测结果DataFrame
evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="trueLabel", metricName="areaUnderROC")
auc = evaluator.evaluate(binary_predictions)

# 打印ROC AUC值
print(f"ROC AUC: {auc}")