Closed abarbet-zz closed 3 years ago
@abarbet Oh, it's an old known question, see the comments below.
@abarbet More info about the probability of SVMs, which do not directly provide a probability. By default, the SVM model gives you a voting-based "pseudo" probability distribution. I know sklearn introduces Platt Scaling that internally uses cross-validation to obtain a probability, but the exported PMML has no info about them, so the probability is just a multiple of 1/3.
I close this issue now. if you have other problems, please feel free to open a new one.
I trained an SVM using sklearn with the
probability
option set to True. This instructs the classifier to add Platt scaling accessible through the classifier'spredict_proba
method to assign a probability to each of the possible classes. When I serialize this model into PMML, I'm gettingOutputFields
that match what I'd expect to see, i.e.,<Output>
<OutputField name="probability_0" optype="continuous" dataType="double" feature="probability" value="0"/>
<OutputField name="probability_1" optype="continuous" dataType="double" feature="probability" value="1"/>
<OutputField name="probability_2" optype="continuous" dataType="double" feature="probability" value="2"/>
<OutputField name="predicted_target" optype="categorical" dataType="integer" feature="predictedValue"/>
</Output>
However, any time I call the PMML4S
predict
method on this model, I seem to be getting rounding errors with these first three probability output fields. Each probability is always being rounded to some multiple of 1/3. For example, if the predicted class is0
, then the probability looks like[0.6666666666666666, 0.0, 0.3333333333333333, 0]
. The same applies for any predicted class, where each probability is a multiple of 1/3.I've checked all the variables in my PMML file, and I know that the input to the PMML is correct (i.e. the rounding errors aren't occurring there). Do you know what could be causing this?