autodeployai / pmml4s

PMML scoring library for Scala
https://www.pmml4s.org/
Apache License 2.0
58 stars 9 forks source link

Accessing model coefficients #17

Closed oren0e closed 2 years ago

oren0e commented 2 years ago

Is there a way to access a logistic regression's model saved as pmml file coefficients within Java? I can't find it in the documentations.

scorebot commented 2 years ago

@oren0e You can use the following code to get coefficients of the first RegressionTable:

val coefficients = model.asInstanceOf[RegressionModel].regressionTables(0).predictors.map(x => x.asInstanceOf[NumericPredictor].coefficient)

The code above is in Scala, and it can be translated into Java easily.

oren0e commented 2 years ago

@scorebot thanks for the answer. I'm still not very experienced with Java, how can I translate this Scala code into Java? Can you help? I see that there is no asInstanceOf in the Model object in Java

oren0e commented 2 years ago
RegressionModel coef = (RegressionModel) model;
List<NumericPredictor> coefficients = coef.regressionTables()[0].predictors().stream().map(x -> x.coefficient instanceof NumericPredictor).collect(Collectors.toList());

I've come up with this code, but I get error: Cannot invoke stream() on the array type RegressionPredictor[] I also need to get the intercept somehow, how can I do that?

scorebot commented 2 years ago

@oren0e For your reference:

RegressionModel regModel = (RegressionModel)model;
RegressionTable regTable = regModel.regressionTables()[0]; // Get the first RegressionTable
RegressionPredictor[] regPredictors= regTable.predictors();
double[] coefficients = new double[regPredictors.length];
for (int i = 0; i < coefficients.length; i++) {
    coefficients[i] =((NumericPredictor)regPredictors[i]).coefficient();
}
oren0e commented 2 years ago

@scorebot thanks! I came up with:

RegressionModel coef = (RegressionModel) model;
HashMap<String, Double> coefficients = new HashMap<>();
coefficients.put("Intercept", coef.regressionTables()[0].intercept());
Stream.of(coef.regressionTables()[0].predictors())
                                               .map((Function<RegressionPredictor, NumericPredictor>) x -> {
                                                return ((NumericPredictor) x);
                                               })
                                               .forEach(entry -> coefficients.put(
                                                   entry.field().name().replace("standardScaler(", "").replace(")", ""),
                                                   entry.coefficient()))

Last question - if I want to get the unscaled coefficients, is there any way to do so? (I applied StandardScaler to the data in the model)

scorebot commented 2 years ago

No, the PMML does not contain the unscaled coefficients

oren0e commented 2 years ago

@scorebot I saw it stores the information of the transformation, I just don't know how to access it. I can do the reverse calculation myself. If part of my model output is like this:

<LocalTransformations>
            <DerivedField name="standardScaler(featureA)" optype="continuous" dataType="double">
                <Apply function="/">
                    <Apply function="-">
                        <FieldRef field="FeatureA"/>
                        <Constant dataType="double">41.1764705882353</Constant>
                    </Apply>
                    <Constant dataType="double">17.627569630345086</Constant>
                </Apply>
            </DerivedField>
            <DerivedField name="standardScaler(FeatureB)" optype="continuous" dataType="double">
                <Apply function="/">
                    <Apply function="-">
                        <FieldRef field="FeatureB"/>
                        <Constant dataType="double">5.1996797294117644E7</Constant>
                    </Apply>
                    <Constant dataType="double">2.902992722876481E7</Constant>
                </Apply>
            </DerivedField>

I just need the second <Constant dataType ...> for each feature. How can I extract it?

scorebot commented 2 years ago

You can use the following hard-code to get the Constant values if you know your PMML model well:

LocalTransformations localTrans = model.localTransformations().get();
DerivedField[] fields = localTrans.fields();
Map<String, Double> constants = new HashMap<>(fields.length);
for (int i = 0; i < fields.length; i++) {
    DerivedField field = fields[i];
    Apply apply= (Apply)field.expr();
    Constant constant = (Constant)apply.children()[1]; // Get the second expression: Constant
    constants.put(field.name(), (Double)constant.value());
}

Basically, the data structures of PMML4S match exactly the XML elements of PMML, everything can be got from the model object.

oren0e commented 2 years ago

Thanks! You helped me a lot. It would have been nice if there was a documentation describing the methods for each object etc. For example, I didn't know that you're supposed to call .get() on LocalTransformations