autodeployai / pmml4s

PMML scoring library for Scala
https://www.pmml4s.org/
Apache License 2.0
58 stars 9 forks source link

Missing TargetFields in response #4

Closed apoorv22 closed 4 years ago

apoorv22 commented 4 years ago

The result map obtained from predict method doesn't have the TargetField value.The TargetField value is present in jpmml evaluator response.

The output from PMML4s:

{
  "probability(0)": 0.5680647740466914,
  "probability(1)": 0.4319352259533086
}

The output from JPMML:

{
  "Bad": 0,
  "probability(0)": 0.5680647740466914,
  "probability(1)": 0.4319352259533086
}

Only the OutputFields are present in the response, TargetField should also come in the response as in JPMML response

Link to working code here

scorebot commented 4 years ago

@apoorv22 I think you mean the predicted value is missing in response, PMML4S completely follows the PMML standard, the PMML contains the following Output element:

<Output>
    <OutputField name="probability(0)" optype="continuous" dataType="double" feature="probability" value="0"/>
    <OutputField name="probability(1)" optype="continuous" dataType="double" feature="probability" value="1"/>
</Output>

Based on the standard described here http://dmg.org/pmml/v4-4/Output.html: Output element describes a set of result values that can be returned from a model, so PMML4S just returns both probabilities of 0 and 1, whic are expected. Since the Output element is optional, if it's missing, PMML4S will output all possible results, see the section Understand the result values for details.

If you want to get the predicted value in the response, you could use one of the following ways:

  1. add an extra output field element in the PMML, for example:

    <OutputField name="Bad" optype="categorical" dataType="integer" feature="predictedValue" />
  2. remove the Output element, so all possible results including the predicted value, and probabilities will be produced.

  3. compute the predicted value by your self, the category of the highest probability will win.

apoorv22 commented 4 years ago

@scorebot Thanks for the quick reply.

About your first and second suggestion, In most cases, the PMML files are generated by various tools and to modify it by hand could be prone to errors.

Also since the logic to create the predicted value Output element is already present in the PMML, creating it outside could result in slightly different value based on language/platform.

It would be very nice to have a method which returns the Output element along with the currently generated elements.

scorebot commented 4 years ago

The method outputFields() of model returns a list of output fields, which could come from ones defined in PMML Output, or predefined by PMML4S. Once there are output fields in PMML Output, we could not generate new output fields into it, because those defined output fields could involve post transformations.

apoorv22 commented 4 years ago

@scorebot Precisely.

The method outputFields() of model returns what its supposed to. Would be nice to have a method which returns the Output element along with the currently generated elements as that would match with the output generated by JPMML.

scorebot commented 4 years ago

@apoorv22 The method candidateOutputFields() returns all possible output fields that could be generated by the model, about the current model, it will return four fields, for example:

OutputField(name=predicted_Bad, displayName=Some(Predicted value of Bad), dataType=integer, opType=nominal, feature=predictedValue, targetField=None, value=None, ruleFeature=consequent, algorithm=exclusiveRecommendation, rank=1, rankBasis=confidence, rankOrder=descending, isMultiValued=false, segmentId=None, isFinalResult=true, decisions=None, expr=None)
OutputField(name=probability, displayName=Some(Probability of predicted value), dataType=real, opType=continuous, feature=probability, targetField=None, value=None, ruleFeature=consequent, algorithm=exclusiveRecommendation, rank=1, rankBasis=confidence, rankOrder=descending, isMultiValued=false, segmentId=None, isFinalResult=true, decisions=None, expr=None)
OutputField(name=probability_0, displayName=Some(Probability of 0), dataType=real, opType=continuous, feature=probability, targetField=None, value=Some(0), ruleFeature=consequent, algorithm=exclusiveRecommendation, rank=1, rankBasis=confidence, rankOrder=descending, isMultiValued=false, segmentId=None, isFinalResult=true, decisions=None, expr=None)
OutputField(name=probability_1, displayName=Some(Probability of 1), dataType=real, opType=continuous, feature=probability, targetField=None, value=Some(1), ruleFeature=consequent, algorithm=exclusiveRecommendation, rank=1, rankBasis=confidence, rankOrder=descending, isMultiValued=false, segmentId=None, isFinalResult=true, decisions=None, expr=None)

The output fields of candidateOutputFields() are only used when there is no Output element in PMML.

scorebot commented 4 years ago

@apoorv22 Do you have any other issues?

scorebot commented 4 years ago

@apoorv22 I close this issue now. if you have other problems, please feel free to open new issues.